diff --git a/README.md b/README.md index 821b07c10..e32e41fc9 100644 --- a/README.md +++ b/README.md @@ -115,9 +115,6 @@ or via `conda`: $ conda install -c conda-forge atom-ml -| NOTE: Since atom was already taken, download the package under the name `atom-ml`! | -| --- | -

diff --git a/atom/api.py b/atom/api.py index 11b096d7e..52b4708d2 100644 --- a/atom/api.py +++ b/atom/api.py @@ -29,7 +29,7 @@ def ATOMModel( fullname: str = None, needs_scaling: bool = False, ): - """Convert an estimator to a model that can be ingested by ATOM. + """Convert an estimator to a model that can be ingested by atom. This function adds the relevant attributes to the estimator so that they can be used when initializing the CustomModel class. diff --git a/docs/API/ATOM/atomclassifier/index.html b/docs/API/ATOM/atomclassifier/index.html index 8f4284db0..8b2d273bc 100644 --- a/docs/API/ATOM/atomclassifier/index.html +++ b/docs/API/ATOM/atomclassifier/index.html @@ -1061,7 +1061,7 @@

Utility methods


method add(transformer, columns=None, train_only=False)
-
[source]
+
[source]

Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's @@ -1104,7 +1104,7 @@

Utility methods


method apply(func, column)
-
[source]
+
[source]

Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the @@ -1133,7 +1133,7 @@

Utility methods


method automl(**kwargs)
-
[source]
+
[source]

Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are @@ -1152,7 +1152,7 @@

Utility methods


method calibrate(**kwargs)
-
[source]
+
[source]

Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, @@ -1174,7 +1174,7 @@

Utility methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
[source]
+
[source]

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1212,7 +1212,7 @@

Utility methods


method delete(models=None)
-
[source]
+
[source]

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1229,7 +1229,7 @@

Utility methods


method distribution(column=0)
-
[source]
+
[source]

Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored.

@@ -1259,7 +1259,7 @@

Utility methods


method drop(columns)
-
[source]
+
[source]

Drop columns from the dataset.

Note

@@ -1280,7 +1280,7 @@

Utility methods


method export_pipeline(model=None)
-
[source]
+
[source]

Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler @@ -1307,7 +1307,7 @@

Utility methods


method get_class_weight(dataset="train")
-
[source]
+
[source]

Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -1333,7 +1333,7 @@

Utility methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -1352,7 +1352,7 @@

Utility methods


method report(dataset="dataset", n_rows=None, filename=None)
-
[source]
+
[source]

Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k.

@@ -1389,13 +1389,13 @@

Utility methods




method reset_predictions()
-
[source]
+
[source]

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


method save(filename=None, save_data=True)
-
[source]
+
[source]

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1418,7 +1418,7 @@

Utility methods


method save_data(filename=None, dataset="dataset")
-
[source]
+
[source]

Save the data in the current branch to a csv file.

@@ -1437,7 +1437,7 @@

Utility methods


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Print all the models' scoring for a specific metric.

@@ -1468,7 +1468,7 @@

Utility methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
[source]
+
[source]

Add a Stacking instance to the models in the pipeline.

@@ -1505,19 +1505,19 @@

Utility methods


method stats()
-
[source]
+
[source]

Print basic information about the dataset.


method status()
-
[source]
+
[source]

Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger.


method voting(models=None, weights=None)
-
[source]
+
[source]

Add a Voting instance to the models in the pipeline.

@@ -1582,14 +1582,14 @@

Data cleaning


method scale(strategy="standard")
-
[source]
+
[source]

Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class.


method clean(prohibited_types=None, strip_categorical=True, maximum_cardinality=True,
-             minimum_cardinality=True, missing_target=True, encode_target=None) 
-
[source]
+ minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) +
[source]

Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are:

diff --git a/docs/API/ATOM/atomregressor/index.html b/docs/API/ATOM/atomregressor/index.html index 28f877c9a..e6be6c8ce 100644 --- a/docs/API/ATOM/atomregressor/index.html +++ b/docs/API/ATOM/atomregressor/index.html @@ -606,7 +606,7 @@

ATOMRegressor


class atom.api.ATOMRegressor(*arrays, y=-1, n_rows=1, test_size=0.2, logger=None,
                              n_jobs=1, warnings=True, verbose=0, random_state=None)
-
[source]
+
[source]

ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already @@ -1039,7 +1039,7 @@

Utility methods


method add(transformer, columns=None, train_only=False)
-
[source]
+
[source]

Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's @@ -1082,7 +1082,7 @@

Utility methods


method apply(func, column)
-
[source]
+
[source]

Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the @@ -1111,7 +1111,7 @@

Utility methods


method automl(**kwargs)
-
[source]
+
[source]

Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are @@ -1130,7 +1130,7 @@

Utility methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
[source]
+
[source]

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1168,7 +1168,7 @@

Utility methods


method delete(models=None)
-
[source]
+
[source]

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1185,7 +1185,7 @@

Utility methods


method distribution(column=0)
-
[source]
+
[source]

Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored.

@@ -1215,7 +1215,7 @@

Utility methods


method drop(columns)
-
[source]
+
[source]

Drop columns from the dataset.

Note

@@ -1236,7 +1236,7 @@

Utility methods


method export_pipeline(model=None)
-
+
[source]

Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler @@ -1263,7 +1263,7 @@

Utility methods


method log(msg, level=0)
-
+
[source]

Write a message to the logger and print it to stdout.

Parameters:
@@ -1282,7 +1282,7 @@

Utility methods


method report(dataset="dataset", n_rows=None, filename=None)
-
[source]
+
[source]

Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k.

@@ -1319,13 +1319,13 @@

Utility methods




method reset_predictions()
-
[source]
+
[source]

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


method save(filename=None, save_data=True)
-
[source]
+
[source]

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1348,7 +1348,7 @@

Utility methods


method save_data(filename=None, dataset="dataset")
-
[source]
+
[source]

Save the data in the current branch to a csv file.

@@ -1367,7 +1367,7 @@

Utility methods


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Print all the models' scoring for a specific metric.

@@ -1386,7 +1386,7 @@

Utility methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
[source]
+
[source]

Add a Stacking instance to the models in the pipeline.

@@ -1423,19 +1423,19 @@

Utility methods


method stats()
-
[source]
+
[source]

Print basic information about the dataset.


method status()
-
[source]
+
[source]

Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger.


method voting(models=None, weights=None)
-
[source]
+
[source]

Add a Voting instance to the models in the pipeline.

@@ -1495,14 +1495,14 @@

Data cleaning


method scale(strategy="standard")
-
[source]
+
[source]

Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class.


method clean(prohibited_types=None, strip_categorical=True, maximum_cardinality=True,
-             minimum_cardinality=True, missing_target=True, encode_target=None) 
-
[source]
+ minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) +
[source]

Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are:

@@ -746,7 +746,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -784,7 +784,7 @@

Methods


method transform(X, y) 
-
[source]
+
[source]

Oversample or undersample the data.

diff --git a/docs/API/data_cleaning/cleaner/index.html b/docs/API/data_cleaning/cleaner/index.html index 2010bfdf8..7f6ef05e8 100644 --- a/docs/API/data_cleaning/cleaner/index.html +++ b/docs/API/data_cleaning/cleaner/index.html @@ -591,7 +591,7 @@

Cleaner

class atom.data_cleaning.Cleaner(prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True,
                                  strip_categorical=True, drop_duplicates=False, missing_target=True,
                                  encode_target=True, verbose=0, logger=None)
-
[source]
+
[source]

Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are:

@@ -802,7 +802,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -843,7 +843,7 @@

Methods


method transform(X, y=None) 
-
[source]
+
[source]

Apply the data cleaning steps on the data.

diff --git a/docs/API/data_cleaning/encoder/index.html b/docs/API/data_cleaning/encoder/index.html index 4f4753307..7dff221cc 100644 --- a/docs/API/data_cleaning/encoder/index.html +++ b/docs/API/data_cleaning/encoder/index.html @@ -586,7 +586,7 @@

Encoder


class atom.data_cleaning.Encoder(strategy="LeaveOneOut", max_onehot=10,
                                  frac_to_other=None, verbose=0, logger=None, **kwargs)
-
[source]
+
[source]

Perform encoding of categorical features. The encoding type depends on the number of classes in the column:

@@ -783,7 +783,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -802,7 +802,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -840,7 +840,7 @@

Methods


method transform(X, y=None) 
-
[source]
+
[source]

Encode the data.

diff --git a/docs/API/data_cleaning/imputer/index.html b/docs/API/data_cleaning/imputer/index.html index 6d1122bff..260271bc2 100644 --- a/docs/API/data_cleaning/imputer/index.html +++ b/docs/API/data_cleaning/imputer/index.html @@ -590,7 +590,7 @@

Imputer


class atom.data_cleaning.Imputer(strat_num="drop", strat_cat="drop", min_frac_rows=None,
                                  min_frac_cols=None, verbose=0, logger=None)
-
[source]
+
[source]

Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the missing attribute to customize what are considered "missing @@ -720,7 +720,7 @@

Methods


method fit(X, y=None) 
-
[source]
+
[source]

Fit to data.

@@ -809,7 +809,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -828,7 +828,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -866,7 +866,7 @@

Methods


method transform(X, y=None) 
-
[source]
+
[source]

Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation.

diff --git a/docs/API/data_cleaning/pruner/index.html b/docs/API/data_cleaning/pruner/index.html index b557f2b56..7c2e9c3c9 100644 --- a/docs/API/data_cleaning/pruner/index.html +++ b/docs/API/data_cleaning/pruner/index.html @@ -590,7 +590,7 @@

Pruner


class atom.data_cleaning.Pruner(strategy="z-score", method="drop", max_sigma=3,
                                 include_target=False, verbose=0, logger=None, **kwargs)
-
[source]
+
[source]

Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed @@ -736,7 +736,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -755,7 +755,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -793,7 +793,7 @@

Methods


method transform(X, y=None) 
-
[source]
+
[source]

Apply the outlier strategy on the data.

diff --git a/docs/API/data_cleaning/scaler/index.html b/docs/API/data_cleaning/scaler/index.html index 044b190fe..3ee3445a7 100644 --- a/docs/API/data_cleaning/scaler/index.html +++ b/docs/API/data_cleaning/scaler/index.html @@ -589,7 +589,7 @@

Scaler


class atom.data_cleaning.Scaler(strategy="standard", verbose=0, logger=None)
-
[source]
+
[source]

This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the @@ -692,7 +692,7 @@

Methods


method fit(X, y=None) 
-
[source]
+
[source]

Compute the mean and std to be used for later scaling.

@@ -769,7 +769,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -788,7 +788,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

@@ -826,7 +826,7 @@

Methods


method transform(X, y=None) 
-
[source]
+
[source]

Perform standardization by centering and scaling.

diff --git a/docs/API/feature_engineering/feature_generator/index.html b/docs/API/feature_engineering/feature_generator/index.html index 8918a178b..b7c70810b 100644 --- a/docs/API/feature_engineering/feature_generator/index.html +++ b/docs/API/feature_engineering/feature_generator/index.html @@ -838,7 +838,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -857,7 +857,7 @@

Methods


method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

diff --git a/docs/API/feature_engineering/feature_selector/index.html b/docs/API/feature_engineering/feature_selector/index.html index df7af46a7..d4bc2d485 100644 --- a/docs/API/feature_engineering/feature_selector/index.html +++ b/docs/API/feature_engineering/feature_selector/index.html @@ -950,7 +950,7 @@

Methods


method log(msg, level=0)
-
[source]
+
[source]

Write a message to the logger and print it to stdout.

@@ -969,19 +969,19 @@

Methods


method plot_pca(title=None, figsize=(10, 6), filename=None, display=True)
-
[source]
+
[source]

Plot the explained variance ratio vs the number of components. See plot_pca for a description of the parameters.


method plot_components(show=None, title=None, figsize=None, filename=None, display=True)
-
[source]
+
[source]

Plot the explained variance ratio per components. See plot_components for a description of the parameters.


method plot_rfecv(title=None, figsize=(10, 6), filename=None, display=True)
-
[source]
+
[source]

Plot the scores obtained by the estimator fitted on every subset of the data. See plot_rfecv for a description of the parameters.


@@ -992,7 +992,7 @@

Methods




method save(filename=None)
-
[source]
+
[source]

Save the instance to a pickle file.

diff --git a/docs/API/models/adab/index.html b/docs/API/models/adab/index.html index ca4c5506b..456f3245e 100644 --- a/docs/API/models/adab/index.html +++ b/docs/API/models/adab/index.html @@ -593,9 +593,9 @@

AdaBoost (AdaB)


AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on - the original dataset and then fits additional copies of the algorithm on the - same dataset but where the weights of instances are adjusted according to the - error of the current prediction.

+the original dataset and then fits additional copies of the algorithm on the +same dataset but where the weights of instances are adjusted according to the +error of the current prediction.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -842,8 +844,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.adab.plot_permutation_importance() - or atom.adab.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.adab.plot_permutation_importance() +or atom.adab.predict(X).The remaining utility methods can be found hereunder:

@@ -879,13 +881,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -902,13 +904,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -921,13 +924,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -936,7 +939,7 @@

Methods

metric: str or None, optional (default=None)
Name of the metric to calculate. Choose from any of sklearn's SCORERS - or one of the following custom metrics (only if classifier): +or one of the following custom metrics (only if classifier):
  • "cm" for the confusion matrix.
  • "tn" for true negatives.
  • @@ -970,7 +973,7 @@

    Methods


    method save_estimator(filename=None)
    -
    +
    [source]

    Save the estimator to a pickle file.

diff --git a/docs/API/models/ard/index.html b/docs/API/models/ard/index.html index 7ca1d6a75..1c3317100 100644 --- a/docs/API/models/ard/index.html +++ b/docs/API/models/ard/index.html @@ -593,9 +593,9 @@

Automatic Relevance Determination (ARD)


Automatic Relevance Determination is very similar to Bayesian Ridge, but - can lead to sparser coefficients. Fit the weights of a regression model, using an - ARD prior. The weights of the regression model are assumed to be in Gaussian - distributions.

+can lead to sparser coefficients. Fit the weights of a regression model, using an +ARD prior. The weights of the regression model are assumed to be in Gaussian +distributions.

Corresponding estimators are:

@@ -732,7 +732,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -744,8 +745,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -757,7 +758,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -767,9 +769,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -787,9 +789,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -817,8 +819,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.ard.plot_permutation_importance() - or atom.ard.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.ard.plot_permutation_importance() +or atom.ard.predict(X).The remaining utility methods can be found hereunder:

@@ -849,13 +851,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

@@ -868,13 +871,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -905,7 +908,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/bag/index.html b/docs/API/models/bag/index.html index f160c202f..3639527a7 100644 --- a/docs/API/models/bag/index.html +++ b/docs/API/models/bag/index.html @@ -592,12 +592,13 @@

Bagging (Bag)


-

Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on - random subsets of the original dataset and then aggregate their individual predictions - (either by voting or by averaging) to form a final prediction. Such a meta-estimator - can typically be used as a way to reduce the variance of a black-box estimator - (e.g., a decision tree), by introducing randomization into its construction - procedure and then making an ensemble out of it.

+

Bagging uses an ensemble meta-estimator that fits base classifiers/regressors +each on random subsets of the original dataset and then aggregate their +individual predictions (either by voting or by averaging) to form a final +prediction. Such a meta-estimator can typically be used as a way to reduce +the variance of a black-box estimator (e.g., a decision tree), +by introducing randomization into its construction procedure and then +making an ensemble out of it.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -839,8 +842,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.bag.plot_permutation_importance() - or atom.bag.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.bag.plot_permutation_importance() +or atom.bag.predict(X).The remaining utility methods can be found hereunder:

@@ -876,13 +879,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -899,13 +902,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -918,13 +922,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -967,7 +971,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/bnb/index.html b/docs/API/models/bnb/index.html index 927390d66..caae55859 100644 --- a/docs/API/models/bnb/index.html +++ b/docs/API/models/bnb/index.html @@ -592,10 +592,10 @@

Bernoulli Naive Bayes (BNB)


-

Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli - models. Like Multinomial Naive bayes (MNB), this classifier is suitable for - discrete data. The difference is that while MNB works with occurrence counts, BNB - is designed for binary/boolean features.

+

Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate +Bernoulli models. Like Multinomial Naive bayes (MNB), this +classifier is suitable for discrete data. The difference is that while +MNB works with occurrence counts, BNB is designed for binary/boolean features.

Corresponding estimators are:

@@ -721,7 +721,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -733,8 +734,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -746,7 +747,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -756,9 +758,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -776,9 +778,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -822,8 +824,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X). +The remaining utility methods can be found hereunder:

@@ -859,13 +861,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -882,13 +884,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -901,13 +904,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -950,7 +953,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/br/index.html b/docs/API/models/br/index.html index b8f3f0126..e7040829a 100644 --- a/docs/API/models/br/index.html +++ b/docs/API/models/br/index.html @@ -592,9 +592,9 @@

Bayesian Ridge (BR)


-

Bayesian regression techniques can be used to include regularization parameters in the - estimation procedure: the regularization parameter is not set in a hard sense but - tuned to the data at hand.

+

Bayesian regression techniques can be used to include regularization +parameters in the estimation procedure: the regularization parameter +is not set in a hard sense but tuned to the data at hand.

Corresponding estimators are:

@@ -731,7 +731,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -743,8 +744,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -756,7 +757,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -766,9 +768,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -786,9 +788,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -816,8 +818,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.br.plot_permutation_importance() - or atom.br.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.br.plot_permutation_importance() +or atom.br.predict(X).The remaining utility methods can be found hereunder:

@@ -848,13 +850,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

@@ -867,13 +870,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -904,7 +907,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/catb/index.html b/docs/API/models/catb/index.html index 6ed85d9e7..b106372f6 100644 --- a/docs/API/models/catb/index.html +++ b/docs/API/models/catb/index.html @@ -592,8 +592,8 @@

CatBoost (CatB)


-

CatBoost is a machine learning method based on gradient boosting over decision trees. - Main advantages of CatBoost:

+

CatBoost is a machine learning method based on gradient boosting over +decision trees. Main advantages of CatBoost:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -861,8 +864,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X). +The remaining utility methods can be found hereunder:

@@ -898,13 +901,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -921,13 +924,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -940,13 +944,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -989,7 +993,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/catnb/index.html b/docs/API/models/catnb/index.html index 556dbcb31..88f5e3033 100644 --- a/docs/API/models/catnb/index.html +++ b/docs/API/models/catnb/index.html @@ -592,7 +592,8 @@

Categorical Naive Bayes (CatNB)


-

Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features.

+

Categorical Naive Bayes implements the Naive Bayes algorithm for +categorical features.

Corresponding estimators are:

@@ -718,7 +719,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -730,8 +732,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -743,7 +745,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -753,9 +756,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -773,9 +776,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -819,8 +822,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X). +The remaining utility methods can be found hereunder:

@@ -856,13 +859,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -879,13 +882,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -898,13 +902,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -947,7 +951,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/cnb/index.html b/docs/API/models/cnb/index.html index 6247a0298..dd9e6fee1 100644 --- a/docs/API/models/cnb/index.html +++ b/docs/API/models/cnb/index.html @@ -592,9 +592,9 @@

Complement Naive Bayes (CNB)


-

The Complement Naive Bayes classifier was designed to correct the “severe assumptions” - made by the standard Multinomial Naive Bayes classifier. It is particularly - suited for imbalanced data sets.

+

The Complement Naive Bayes classifier was designed to correct the +“severe assumptions” made by the standard Multinomial Naive Bayes +classifier. It is particularly suited for imbalanced data sets.

Corresponding estimators are:

@@ -724,7 +724,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -736,8 +737,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -749,7 +750,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -759,9 +761,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -779,9 +781,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -825,8 +827,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X). +The remaining utility methods can be found hereunder:

@@ -862,13 +864,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -885,13 +887,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -904,13 +907,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -953,7 +956,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/en/index.html b/docs/API/models/en/index.html index 4a9245141..da13e11d9 100644 --- a/docs/API/models/en/index.html +++ b/docs/API/models/en/index.html @@ -603,9 +603,9 @@

Elastic Net (EN)

Hyperparameters


@@ -722,7 +722,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -734,8 +735,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -747,7 +748,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -757,9 +759,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -777,9 +779,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -807,8 +809,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.en.plot_permutation_importance() - or atom.en.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.en.plot_permutation_importance() + or atom.en.predict(X).The remaining utility methods can be found hereunder:

@@ -839,13 +841,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

@@ -858,13 +861,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -895,7 +898,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/et/index.html b/docs/API/models/et/index.html index 05aedf8c9..e2ef4337b 100644 --- a/docs/API/models/et/index.html +++ b/docs/API/models/et/index.html @@ -592,9 +592,10 @@

Extra-Trees (ET)


-

Extra-Trees use a meta estimator that fits a number of randomized decision trees - (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging - to improve the predictive accuracy and control over-fitting.

+

Extra-Trees use a meta estimator that fits a number of randomized +decision trees (a.k.a. extra-trees) on various sub-samples of the +dataset and uses averaging to improve the predictive accuracy and +control over-fitting.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -858,8 +861,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.et.plot_permutation_importance() - or atom.et.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.et.plot_permutation_importance() +or atom.et.predict(X).The remaining utility methods can be found hereunder:

@@ -895,13 +898,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -918,13 +921,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -937,13 +941,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -986,7 +990,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/gbm/index.html b/docs/API/models/gbm/index.html index 95274c5e6..274e05d44 100644 --- a/docs/API/models/gbm/index.html +++ b/docs/API/models/gbm/index.html @@ -592,11 +592,12 @@

Gradient Boosting Machine (GBM)


-

A Gradient Boosting Machine builds an additive model in a forward stage-wise - fashion; it allows for the optimization of arbitrary differentiable loss - functions. In each stage n_classes_ regression trees are fit on the negative - gradient of the binomial or multinomial deviance loss function. Binary - classification is a special case where only a single regression tree is induced.

+

A Gradient Boosting Machine builds an additive model in a forward +stage-wise fashion; it allows for the optimization of arbitrary +differentiable loss functions. In each stage n_classes_ regression +trees are fit on the negative gradient of the binomial or multinomial +deviance loss function. Binary classification is a special case where +only a single regression tree is induced.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -876,8 +879,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() - or atom.gbm.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() +or atom.gbm.predict(X).The remaining utility methods can be found hereunder:

@@ -913,13 +916,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -936,13 +939,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -955,13 +959,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -1004,7 +1008,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/gnb/index.html b/docs/API/models/gnb/index.html index ab62a2909..990581eaa 100644 --- a/docs/API/models/gnb/index.html +++ b/docs/API/models/gnb/index.html @@ -592,8 +592,9 @@

Gaussian Naive bayes (GNB)


-

Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The - likelihood of the features is assumed to be Gaussian.

+

Gaussian Naive Bayes implements the Naive Bayes algorithm for +classification. The likelihood of the features is assumed to +be Gaussian.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -782,8 +784,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X). +The remaining utility methods can be found hereunder:

@@ -819,13 +821,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -842,13 +844,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -861,13 +864,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -910,7 +913,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/gp/index.html b/docs/API/models/gp/index.html index db726b97e..abdba62ff 100644 --- a/docs/API/models/gp/index.html +++ b/docs/API/models/gp/index.html @@ -592,19 +592,22 @@

Gaussian Process (GP)


-

Gaussian Processes are a generic supervised learning method designed to solve - regression and probabilistic classification problems. The advantages of Gaussian processes are:

+

Gaussian Processes are a generic supervised learning method designed +to solve regression and probabilistic classification problems. The +advantages of Gaussian processes are:

The disadvantages of Gaussian processes include:

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -795,8 +799,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X). +The remaining utility methods can be found hereunder:

@@ -832,13 +836,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -855,13 +859,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -874,13 +879,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -923,7 +928,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/knn/index.html b/docs/API/models/knn/index.html index 65e057ecf..064d7c537 100644 --- a/docs/API/models/knn/index.html +++ b/docs/API/models/knn/index.html @@ -592,9 +592,10 @@

K-Nearest Neighbors (KNN)


-

K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest - neighbors vote. For regression, the target is predicted by local interpolation - of the targets associated of the nearest neighbors in the training set.

+

K-Nearest Neighbors, as the name clearly indicates, implements the +k-nearest neighbors vote. For regression, the target is predicted +by local interpolation of the targets associated of the nearest +neighbors in the training set.

Corresponding estimators are:

@@ -734,7 +735,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -746,8 +748,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -759,7 +761,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -769,9 +772,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -789,9 +792,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -835,8 +838,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.knn.plot_permutation_importance() - or atom.knn.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.knn.plot_permutation_importance() +or atom.knn.predict(X).The remaining utility methods can be found hereunder:

@@ -872,13 +875,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -895,13 +898,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -914,13 +918,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -963,7 +967,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/ksvm/index.html b/docs/API/models/ksvm/index.html index 85ed95996..48f373c6a 100644 --- a/docs/API/models/ksvm/index.html +++ b/docs/API/models/ksvm/index.html @@ -593,10 +593,10 @@

Kernel-SVM (kSVM)


The implementation of the Kernel (non-linear) Support Vector Machine is - based on libsvm. The fit time scales at least quadratically with the number - of samples and may be impractical beyond tens of thousands of samples. For - large datasets consider using a Linear Support Vector Machine - or a Stochastic Gradient descent model instead.

+based on libsvm. The fit time scales at least quadratically with the +number of samples and may be impractical beyond tens of thousands of +samples. For large datasets consider using a Linear Support Vector Machine +or a Stochastic Gradient descent model instead.

The multiclass support is handled according to a one-vs-one scheme.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -837,8 +839,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X). +The remaining utility methods can be found hereunder:

@@ -874,13 +876,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -897,13 +899,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -916,13 +919,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -965,7 +968,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/lasso/index.html b/docs/API/models/lasso/index.html index 0145bc7bb..b8f553f4d 100644 --- a/docs/API/models/lasso/index.html +++ b/docs/API/models/lasso/index.html @@ -603,9 +603,9 @@

Lasso Regression (Lasso)

Hyperparameters


@@ -718,7 +718,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -730,8 +731,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -743,7 +744,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -753,9 +755,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -773,9 +775,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -803,8 +805,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() - or atom.lasso.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() +or atom.lasso.predict(X).The remaining utility methods can be found hereunder:

@@ -835,13 +837,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

@@ -854,13 +857,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -891,7 +894,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/lda/index.html b/docs/API/models/lda/index.html index 3bdeb0235..62d1912e3 100644 --- a/docs/API/models/lda/index.html +++ b/docs/API/models/lda/index.html @@ -592,10 +592,10 @@

Linear Discriminant Analysis (LDA)


-

Linear Discriminant Analysis is a classifier with a linear decision boundary, - generated by fitting class conditional densities to the data and using Bayes’ rule. - The model fits a Gaussian density to each class, assuming that all classes share - the same covariance matrix.

+

Linear Discriminant Analysis is a classifier with a linear decision +boundary, generated by fitting class conditional densities to the data +and using Bayes’ rule. The model fits a Gaussian density to each class, +assuming that all classes share the same covariance matrix.

Corresponding estimators are:

@@ -722,7 +722,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -734,8 +735,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -747,7 +748,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -757,9 +759,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -777,9 +779,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -831,8 +833,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.lda.plot_permutation_importance() - or atom.lda.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.lda.plot_permutation_importance() +or atom.lda.predict(X).The remaining utility methods can be found hereunder:

@@ -868,13 +870,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -891,13 +893,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -910,13 +913,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -959,7 +962,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/lgb/index.html b/docs/API/models/lgb/index.html index 01c1c1fb9..c3126d35b 100644 --- a/docs/API/models/lgb/index.html +++ b/docs/API/models/lgb/index.html @@ -592,8 +592,9 @@

LightGBM (LGB)


-

LightGBM is a gradient boosting model that uses tree based learning algorithms. It is - designed to be distributed and efficient with the following advantages:

+

LightGBM is a gradient boosting model that uses tree based learning +algorithms. It is designed to be distributed and efficient with the +following advantages:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -876,8 +880,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X). +The remaining utility methods can be found hereunder:

@@ -913,13 +917,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -936,13 +940,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -955,13 +960,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -1004,7 +1009,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/lr/index.html b/docs/API/models/lr/index.html index 522e2af0a..9e8577d10 100644 --- a/docs/API/models/lr/index.html +++ b/docs/API/models/lr/index.html @@ -592,11 +592,12 @@

Logistic regression (LR)


-

Logistic regression, despite its name, is a linear model for classification rather - than regression. Logistic regression is also known in the literature as logit - regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. - In this model, the probabilities describing the possible outcomes of a single trial - are modeled using a logistic function.

+

Logistic regression, despite its name, is a linear model for +classification rather than regression. Logistic regression is also +known in the literature as logit regression, maximum-entropy +classification (MaxEnt) or the log-linear classifier. In this model, +the probabilities describing the possible outcomes of a single trial +are modeled using a logistic function.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -850,8 +853,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.lr.plot_permutation_importance() - or atom.lr.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.lr.plot_permutation_importance() +or atom.lr.predict(X).The remaining utility methods can be found hereunder:

@@ -887,13 +890,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -910,13 +913,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -929,13 +933,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -978,7 +982,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/lsvm/index.html b/docs/API/models/lsvm/index.html index 9ec6fd83a..391d69dc2 100644 --- a/docs/API/models/lsvm/index.html +++ b/docs/API/models/lsvm/index.html @@ -592,9 +592,10 @@

Linear-SVM (lSVM)


-

Similar to Kernel-SVM but with a linear kernel. Implemented in terms of - liblinear rather than libsvm, so it has more flexibility in the choice of penalties - and loss functions and should scale better to large numbers of samples.

+

Similar to Kernel-SVM but with a linear kernel. Implemented +in terms of liblinear rather than libsvm, so it has more flexibility +in the choice of penalties and loss functions and should scale better +to large numbers of samples.

The multiclass support is handled according to a one-vs-rest scheme.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -828,8 +831,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X). +The remaining utility methods can be found hereunder:

@@ -865,13 +868,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -888,13 +891,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -907,13 +911,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -956,7 +960,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/mlp/index.html b/docs/API/models/mlp/index.html index 0568c8a87..076942b61 100644 --- a/docs/API/models/mlp/index.html +++ b/docs/API/models/mlp/index.html @@ -592,11 +592,12 @@

Multi-layer Perceptron (MLP)


-

Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function - by training on a dataset. Given a set of features and a target, it can learn a - non-linear function approximator for either classification or regression. It is - different from logistic regression, in that between the input and the output layer, - there can be one or more non-linear layers, called hidden layers.

+

Multi-layer Perceptron (MLP) is a supervised learning algorithm that +learns a function by training on a dataset. Given a set of features +and a target, it can learn a non-linear function approximator for +either classification or regression. It is different from logistic +regression, in that between the input and the output layer, there can +be one or more non-linear layers, called hidden layers.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -868,8 +872,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X). +The remaining utility methods can be found hereunder:

@@ -905,13 +909,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -928,13 +932,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -947,13 +952,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -996,7 +1001,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/mnb/index.html b/docs/API/models/mnb/index.html index 7a87f319c..d11da7ba1 100644 --- a/docs/API/models/mnb/index.html +++ b/docs/API/models/mnb/index.html @@ -592,10 +592,11 @@

Multinomial Naive Bayes (MNB)


-

Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially - distributed data, and is one of the two classic Naive Bayes variants used in text - classification (where the data are typically represented as word vector counts, - although tf-idf vectors are also known to work well in practice).

+

Multinomial Naive Bayes implements the Naive Bayes algorithm for +multinomially distributed data, and is one of the two classic Naive +Bayes variants used in text classification (where the data are +typically represented as word vector counts, although tf-idf vectors +are also known to work well in practice).

Corresponding estimators are:

@@ -721,7 +722,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -733,8 +735,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -746,7 +748,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -756,9 +759,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -776,9 +779,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -822,8 +825,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X). +The remaining utility methods can be found hereunder:

@@ -859,13 +862,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -882,13 +885,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -901,13 +905,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -950,7 +954,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/ols/index.html b/docs/API/models/ols/index.html index 83b3c379d..3f93b3327 100644 --- a/docs/API/models/ols/index.html +++ b/docs/API/models/ols/index.html @@ -592,10 +592,10 @@

Ordinary Least Squares (OLS)


-

Ordinary Least Squares is just linear regression without any regularization. It fits - a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of - squares between the observed targets in the dataset, and the targets predicted by - the linear approximation.

+

Ordinary Least Squares is just linear regression without any +regularization. It fits a linear model with coefficients w = (w1, …, wp) +to minimize the residual sum of squares between the observed targets in +the dataset, and the targets predicted by the linear approximation.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -768,8 +769,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X). +The remaining utility methods can be found hereunder:

@@ -800,13 +801,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

@@ -819,13 +821,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -855,7 +857,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/pa/index.html b/docs/API/models/pa/index.html index ff6aa1651..05915ecf0 100644 --- a/docs/API/models/pa/index.html +++ b/docs/API/models/pa/index.html @@ -592,9 +592,10 @@

Passive Aggressive (PA)


-

The passive-aggressive algorithms are a family of algorithms for large-scale learning. - They are similar to the Perceptron in that they do not require a learning rate. However, - contrary to the Perceptron, they include a regularization parameter C.

+

The passive-aggressive algorithms are a family of algorithms for +large-scale learning. They are similar to the Perceptron in that they +do not require a learning rate. However, contrary to the Perceptron, +they include a regularization parameter C.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -825,8 +828,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X). +The remaining utility methods can be found hereunder:

@@ -862,13 +865,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -885,13 +888,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -904,13 +908,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -953,7 +957,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/qda/index.html b/docs/API/models/qda/index.html index e93794b9a..547dcdb29 100644 --- a/docs/API/models/qda/index.html +++ b/docs/API/models/qda/index.html @@ -592,10 +592,10 @@

Quadratic Discriminant Analysis (QDA)


-

Linear Discriminant Analysis is a classifier with a quadratic decision boundary, - generated by fitting class conditional densities to the data and using Bayes’ rule. - The model fits a Gaussian density to each class, assuming that all classes share - the same covariance matrix.

+

Linear Discriminant Analysis is a classifier with a quadratic decision +boundary, generated by fitting class conditional densities to the data +and using Bayes’ rule. The model fits a Gaussian density to each class, +assuming that all classes share the same covariance matrix.

Corresponding estimators are:

@@ -717,7 +717,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -729,8 +730,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -742,7 +743,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -752,9 +754,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -772,9 +774,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -826,8 +828,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.qda.plot_permutation_importance() - or atom.qda.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.qda.plot_permutation_importance() +or atom.qda.predict(X).The remaining utility methods can be found hereunder:

@@ -863,13 +865,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset.

@@ -886,13 +888,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -905,13 +908,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -954,7 +957,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/rf/index.html b/docs/API/models/rf/index.html index ad10dec6e..d8c53ac5f 100644 --- a/docs/API/models/rf/index.html +++ b/docs/API/models/rf/index.html @@ -592,10 +592,11 @@

Random Forest (RF)


-

Random forests are an ensemble learning method that operate by constructing a multitude - of decision trees at training time and outputting the class that is the mode of the - classes (classification) or mean prediction (regression) of the individual trees. - Random forests correct for decision trees" habit of overfitting to their training set.

+

Random forests are an ensemble learning method that operate by +constructing a multitude of decision trees at training time and +outputting the class that is the mode of the classes (classification) +or mean prediction (regression) of the individual trees. Random forests +correct for decision trees" habit of overfitting to their training set.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -859,8 +862,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X). +The remaining utility methods can be found hereunder:

@@ -896,13 +899,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -919,13 +922,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -938,13 +942,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -987,7 +991,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/ridge/index.html b/docs/API/models/ridge/index.html index 9cdd644fa..3696d79ee 100644 --- a/docs/API/models/ridge/index.html +++ b/docs/API/models/ridge/index.html @@ -605,9 +605,9 @@

Ridge Classification/Regression (R

Hyperparameters


@@ -720,7 +720,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -732,8 +733,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -745,7 +746,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -755,9 +757,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -775,9 +777,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -821,8 +823,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() - or atom.ridge.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() +or atom.ridge.predict(X).The remaining utility methods can be found hereunder:

@@ -858,13 +860,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -881,13 +883,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -900,13 +903,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -949,7 +952,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/rnn/index.html b/docs/API/models/rnn/index.html index b17697677..fd36ff701 100644 --- a/docs/API/models/rnn/index.html +++ b/docs/API/models/rnn/index.html @@ -592,10 +592,10 @@

Radius Nearest Neighbors (RNN)


-

Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors - are selected from within a given radius. For regression, the target is predicted - by local interpolation of the targets associated of the nearest neighbors in the - training set.

+

Radius Nearest Neighbors implements the nearest neighbors vote, where +the neighbors are selected from within a given radius. For regression, +the target is predicted by local interpolation of the targets associated +of the nearest neighbors in the training set.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -844,8 +847,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() - or atom.rnn.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() +or atom.rnn.predict(X).The remaining utility methods can be found hereunder:

@@ -881,13 +884,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -904,13 +907,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -923,13 +927,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -972,7 +976,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/sgd/index.html b/docs/API/models/sgd/index.html index 4bc1cf325..473c80e9d 100644 --- a/docs/API/models/sgd/index.html +++ b/docs/API/models/sgd/index.html @@ -592,10 +592,11 @@

Stochastic Gradient Descent (SGD)


-

Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear - classifiers and regressors under convex loss functions. Even though SGD has been - around in the machine learning community for a long time, it has received a - considerable amount of attention just recently in the context of large-scale learning.

+

Stochastic Gradient Descent is a simple yet very efficient approach to +fitting linear classifiers and regressors under convex loss functions. +Even though SGD has been around in the machine learning community for a +long time, it has received a considerable amount of attention just +recently in the context of large-scale learning.

Corresponding estimators are:


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -852,8 +855,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X). +The remaining utility methods can be found hereunder:

@@ -889,13 +892,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -912,13 +915,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -931,13 +935,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -980,7 +984,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/tree/index.html b/docs/API/models/tree/index.html index b60496a7f..e3f27ab4a 100644 --- a/docs/API/models/tree/index.html +++ b/docs/API/models/tree/index.html @@ -605,9 +605,9 @@

Decision Tree (Tree)

Hyperparameters


@@ -745,7 +745,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -757,8 +758,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -770,7 +771,8 @@

Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -780,9 +782,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -800,9 +802,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -846,8 +848,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.tree.plot_permutation_importance() - or atom.tree.predict(X). The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.tree.plot_permutation_importance() +or atom.tree.predict(X).The remaining utility methods can be found hereunder:

@@ -883,13 +885,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -906,13 +908,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -925,13 +928,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -974,7 +977,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/models/xgb/index.html b/docs/API/models/xgb/index.html index d1d939a62..71471c2fa 100644 --- a/docs/API/models/xgb/index.html +++ b/docs/API/models/xgb/index.html @@ -592,9 +592,10 @@

XGBoost (XGB)


-

XGBoost is an optimized distributed gradient boosting model designed to be highly - efficient, flexible and portable. XGBoost provides a parallel tree boosting that - solve many data science problems in a fast and accurate way.

+

XGBoost is an optimized distributed gradient boosting model designed to +be highly efficient, flexible and portable. XGBoost provides a parallel +tree boosting that solve many data science problems in a fast and +accurate way.

Corresponding estimators are:

+ + + +
Attributes: +dataset: pd.DataFrame +
+Complete dataset in the pipeline. +
+train: pd.DataFrame +
+Training set. +
+test: pd.DataFrame +
+Test set. +
+X: pd.DataFrame +
+Feature set. +
+y: pd.Series +
+Target column. +
+X_train: pd.DataFrame +
+Training features. +
+y_train: pd.Series +
+Training target. +
+X_test: pd.DataFrame +
+Test features. +
+y_test: pd.Series +
+Test target. +
+shape: tuple +
+Dataset's shape: (n_rows x n_columns) or +(n_rows, (shape_sample), n_cols) for deep learning datasets. +
+columns: list +
+Names of the columns in the dataset. +
+n_columns: int +
+Number of columns in the dataset. +
+features: list +
+Names of the features in the dataset. +
+n_features: int +
+Number of features in the dataset. +
+target: str +
+Name of the target column. +
+
+


Utility attributes

@@ -694,7 +757,8 @@

Utility attributes

estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -706,8 +770,8 @@

Utility attributes

time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -719,8 +783,9 @@

Utility attributes

evals: dict
-Dictionary of the metric calculated during training. The metric is provided by the estimator's - package and is different for every task. Available keys are: +Dictionary of the metric calculated during training. The metric is +provided by the estimator's package and is different for every task. +Available keys are:
  • "metric": Name of the metric.
  • "train": List of scores calculated on the training set.
  • @@ -729,7 +794,8 @@

    Utility attributes

metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -739,9 +805,9 @@

Utility attributes

Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -759,9 +825,9 @@

    Utility attributes


Prediction attributes

-

The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory.

+

The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.

@@ -805,8 +871,8 @@

Prediction attributes

Methods


The majority of the plots and prediction methods - can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X). - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X). +The remaining utility methods can be found hereunder:

@@ -842,13 +908,13 @@

Methods


method calibrate(**kwargs)
-
[source]
-

Applies probability calibration on the estimator. The calibration is done using the - CalibratedClassifierCV - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the estimator attribute. After calibrating, all prediction attributes will - reset. Only if classifier.

+
[source]
+

Applies probability calibration on the estimator. The calibration is done +using the CalibratedClassifierCV +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the estimator attribute. After calibrating, all +prediction attributes will reset. Only if classifier.

@@ -865,13 +931,14 @@

Methods


method delete()
-
[source]
+
[source]

Delete the model from the trainer.


method rename(name=None)
-
[source]
-

Change the model's tag. Note that the acronym always stays at the beginning of the model's name.

+
[source]
+

Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.

Parameters:
@@ -884,13 +951,13 @@

Methods


method reset_predictions()
-
[source]
+
[source]

Clear all the prediction attributes. - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
[source]
+
[source]

Get the scoring for a specific metric.

Parameters:
@@ -933,7 +1000,7 @@

Methods


method save_estimator(filename=None)
-
[source]
+
[source]

Save the estimator to a pickle file.

diff --git a/docs/API/plots/bar_plot/index.html b/docs/API/plots/bar_plot/index.html index 2ebb393cf..0703bdfc8 100644 --- a/docs/API/plots/bar_plot/index.html +++ b/docs/API/plots/bar_plot/index.html @@ -582,7 +582,7 @@

bar_plot


method bar_plot(models=None, index=None, show=None, target=1,
                 title=None, figsize=None, filename=None, display=True, **kwargs)
-
[source]
+
[source]

Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature @@ -594,9 +594,9 @@

bar_plot

models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: int, tuple, slice or None, optional (default=None)
diff --git a/docs/API/plots/beeswarm_plot/index.html b/docs/API/plots/beeswarm_plot/index.html index 8ec65fbd5..7098df069 100644 --- a/docs/API/plots/beeswarm_plot/index.html +++ b/docs/API/plots/beeswarm_plot/index.html @@ -582,7 +582,7 @@

beeswarm_plot


method beeswarm_plot(models=None, index=None, show=None, target=1,
                      title=None, figsize=None, filename=None, display=True, **kwargs)
-
+

Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the user guide.

@@ -591,9 +591,9 @@

beeswarm_plot

models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: tuple, slice or None, optional (default=None)
diff --git a/docs/API/plots/decision_plot/index.html b/docs/API/plots/decision_plot/index.html index 122ae74be..1ba6a5dbf 100644 --- a/docs/API/plots/decision_plot/index.html +++ b/docs/API/plots/decision_plot/index.html @@ -582,7 +582,7 @@

decision_plot


method decision_plot(models=None, index=None, show=None, target=1,
                      title=None, figsize=None, filename=None, display=True, **kwargs)
-
+

Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple @@ -595,9 +595,9 @@

decision_plot

models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: int, tuple, slice or None, optional (default=None)
@@ -610,8 +610,8 @@

decision_plot

target: int or str, optional (default=1)
-Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
title: str or None, optional (default=None)
diff --git a/docs/API/plots/force_plot/index.html b/docs/API/plots/force_plot/index.html index 229353c7d..fd0e36843 100644 --- a/docs/API/plots/force_plot/index.html +++ b/docs/API/plots/force_plot/index.html @@ -582,7 +582,7 @@

force_plot


method force_plot(models=None, index=None, target=1,
                   title=None, figsize=(14, 6), filename=None, display=True, **kwargs)
-
+

Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use matplotlib=True (this option is only available @@ -594,9 +594,9 @@

force_plot

models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: int, tuple, slice or None, optional (default=None)
@@ -605,8 +605,8 @@

force_plot

target: int or str, optional (default=1)
-Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
title: str or None, optional (default=None)
diff --git a/docs/API/plots/heatmap_plot/index.html b/docs/API/plots/heatmap_plot/index.html index d0484bf20..be728bc30 100644 --- a/docs/API/plots/heatmap_plot/index.html +++ b/docs/API/plots/heatmap_plot/index.html @@ -582,7 +582,7 @@

heatmap_plot


method heatmap_plot(models=None, index=None, show=None, target=1,
                     title=None, figsize=(8, 6), filename=None, display=True, **kwargs)
-
+

Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original @@ -594,9 +594,9 @@

heatmap_plot

models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: tuple, slice or None, optional (default=None)
@@ -610,8 +610,8 @@

heatmap_plot

target: int or str, optional (default=1)
-Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
title: str or None, optional (default=None)
diff --git a/docs/API/plots/plot_bo/index.html b/docs/API/plots/plot_bo/index.html index 93b6f2e77..e140e977e 100644 --- a/docs/API/plots/plot_bo/index.html +++ b/docs/API/plots/plot_bo/index.html @@ -581,7 +581,7 @@

plot_bo


method plot_bo(models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True)
-
+

Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by bo_params={"plot_bo": True} while running the optimization. Creates a canvas with two plots: the first plot shows diff --git a/docs/API/plots/plot_calibration/index.html b/docs/API/plots/plot_calibration/index.html index 42a9e0c84..9d38334f7 100644 --- a/docs/API/plots/plot_calibration/index.html +++ b/docs/API/plots/plot_calibration/index.html @@ -581,7 +581,7 @@

plot_calibration


method plot_calibration(models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True)
-
+

Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a diff --git a/docs/API/plots/plot_components/index.html b/docs/API/plots/plot_components/index.html index dd4f87262..bfc60f254 100644 --- a/docs/API/plots/plot_components/index.html +++ b/docs/API/plots/plot_components/index.html @@ -581,7 +581,7 @@

plot_components


method plot_components(show=None, title=None, figsize=None, filename=None, display=True)
-
+

Plot the explained variance ratio per components. Only available if PCA was applied on the data.

diff --git a/docs/API/plots/plot_confusion_matrix/index.html b/docs/API/plots/plot_confusion_matrix/index.html index 54f805764..30ce6e9e2 100644 --- a/docs/API/plots/plot_confusion_matrix/index.html +++ b/docs/API/plots/plot_confusion_matrix/index.html @@ -582,7 +582,7 @@

plot_confusion_matrix


method plot_confusion_matrix(models=None, dataset="test", normalize=False,
                              title=None, figsize=None, filename=None, display=True)
-
+

Plot a model's confusion matrix. Only for classification tasks.

  • For 1 model: plot the confusion matrix in a heatmap.
  • diff --git a/docs/API/plots/plot_correlation/index.html b/docs/API/plots/plot_correlation/index.html index d526496cc..12cd72c43 100644 --- a/docs/API/plots/plot_correlation/index.html +++ b/docs/API/plots/plot_correlation/index.html @@ -582,7 +582,7 @@

    plot_correlation


    method plot_correlation(columns=None, method="pearson", title=None, figsize=(8, 7), filename=None, display=True)
    -
    +

    Plot the data's correlation matrix.

diff --git a/docs/API/plots/plot_distribution/index.html b/docs/API/plots/plot_distribution/index.html index 0b87fb267..d1108ae43 100644 --- a/docs/API/plots/plot_distribution/index.html +++ b/docs/API/plots/plot_distribution/index.html @@ -583,7 +583,7 @@

plot_distribution

method plot_distribution(columns=0, distribution=None, show=None,
                          title=None, figsize=None, filename=None, display=True, **kwargs)
-
+

Plot column distributions. Additionally, it is possible to plot any of scipy.stats probability distributions fitted to the column. Missing values are ignored.

diff --git a/docs/API/plots/plot_errors/index.html b/docs/API/plots/plot_errors/index.html index b6c2d2d53..7d56686ba 100644 --- a/docs/API/plots/plot_errors/index.html +++ b/docs/API/plots/plot_errors/index.html @@ -581,7 +581,7 @@

plot_errors


method plot_errors(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
+

Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect diff --git a/docs/API/plots/plot_evals/index.html b/docs/API/plots/plot_evals/index.html index 57de34167..ec146424a 100644 --- a/docs/API/plots/plot_evals/index.html +++ b/docs/API/plots/plot_evals/index.html @@ -581,7 +581,7 @@

plot_evals


method plot_evals(models=None, dataset="both", title=None, figsize=(10, 6), filename=None, display=True)
-
+

Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation (XGB, LGB, CatB). The metric is provided by the estimator's diff --git a/docs/API/plots/plot_feature_importance/index.html b/docs/API/plots/plot_feature_importance/index.html index fe81fce55..0c032cd3c 100644 --- a/docs/API/plots/plot_feature_importance/index.html +++ b/docs/API/plots/plot_feature_importance/index.html @@ -581,10 +581,11 @@

plot_feature_importance


method plot_feature_importance(models=None, show=None, title=None, figsize=None, filename=None, display=True)
-
-

Plot a tree-based model's feature importance. The importances are normalized in order -to be able to compare them between models. The feature_importance attribute is -updated with the extracted importance ranking.

+ +

Plot a tree-based model's feature importance. The importances are +normalized in order to be able to compare them between models. The +feature_importance attribute is updated with the extracted importance +ranking.

diff --git a/docs/API/plots/plot_gains/index.html b/docs/API/plots/plot_gains/index.html index ef46591f8..7c9de4c7d 100644 --- a/docs/API/plots/plot_gains/index.html +++ b/docs/API/plots/plot_gains/index.html @@ -581,7 +581,7 @@

plot_gains


method plot_gains(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
+

Plot the cumulative gains curve. Only for binary classification tasks.

Parameters:
diff --git a/docs/API/plots/plot_learning_curve/index.html b/docs/API/plots/plot_learning_curve/index.html index 1706c52c9..a357a6f31 100644 --- a/docs/API/plots/plot_learning_curve/index.html +++ b/docs/API/plots/plot_learning_curve/index.html @@ -581,7 +581,7 @@

plot_learning_curve


method plot_learning_curve(models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True)
-
+

Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using train sizing.

diff --git a/docs/API/plots/plot_lift/index.html b/docs/API/plots/plot_lift/index.html index 93adfa59a..e51b9ea0d 100644 --- a/docs/API/plots/plot_lift/index.html +++ b/docs/API/plots/plot_lift/index.html @@ -581,7 +581,7 @@

plot_lift


method plot_lift(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
+

Plot the lift curve. Only for binary classification.

diff --git a/docs/API/plots/plot_partial_dependence/index.html b/docs/API/plots/plot_partial_dependence/index.html index 88b386850..f8ced701a 100644 --- a/docs/API/plots/plot_partial_dependence/index.html +++ b/docs/API/plots/plot_partial_dependence/index.html @@ -582,13 +582,14 @@

plot_partial_dependence


method plot_partial_dependence(models=None, features=None, target=None,
                                title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the partial dependence of features. The partial dependence of a feature (or a - set of features) corresponds to the average response of the model for each possible - value of the feature. Two-way partial dependence plots are plotted as contour plots - (only allowed for single model plots). The deciles of the feature values will be - shown with tick marks on the x-axes for one-way plots, and on both axes for two-way - plots.

+ +

Plot the partial dependence of features. The partial dependence of a +feature (or a set of features) corresponds to the average response of +the model for each possible value of the feature. Two-way partial +dependence plots are plotted as contour plots (only allowed for single +model plots). The deciles of the feature values will be shown with tick +marks on the x-axes for one-way plots, and on both axes for two-way +plots.

@@ -605,8 +606,8 @@

plot_partial_dependence

target: int or str, optional (default=1)
-Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
title: str or None, optional (default=None)
diff --git a/docs/API/plots/plot_pca/index.html b/docs/API/plots/plot_pca/index.html index 092dc46cc..418d6c537 100644 --- a/docs/API/plots/plot_pca/index.html +++ b/docs/API/plots/plot_pca/index.html @@ -581,9 +581,9 @@

plot_pca


method plot_pca(title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the explained variance ratio vs the number of components. Only available if PCA - was applied on the data.

+ +

Plot the explained variance ratio vs the number of components. Only +available if PCA was applied on the data.

Parameters:
diff --git a/docs/API/plots/plot_permutation_importance/index.html b/docs/API/plots/plot_permutation_importance/index.html index 2fa51b646..6ccbf682f 100644 --- a/docs/API/plots/plot_permutation_importance/index.html +++ b/docs/API/plots/plot_permutation_importance/index.html @@ -582,12 +582,13 @@

plot_permutation_importance


method plot_permutation_importance(models=None, show=None, n_repeats=10,
                                    title=None, figsize=None, filename=None, display=True)
-
-

Plot the feature permutation importance of models. Calculating all permutations can - be time-consuming, especially if n_repeats is high. They are stored under - the attribute permutations. This means that if a plot is repeated for - the same model with the same n_repeats, it will be considerably faster. - The feature_importance attribute is updated with the extracted importance ranking.

+ +

Plot the feature permutation importance of models. Calculating all +permutations can be time-consuming, especially if n_repeats is high. +They are stored under the attribute permutations. This means that if +a plot is repeated for the same model with the same n_repeats, it +will be considerably faster. The feature_importance attribute is +updated with the extracted importance ranking.

Parameters:
diff --git a/docs/API/plots/plot_pipeline/index.html b/docs/API/plots/plot_pipeline/index.html index 2c30fb6bb..8cda3cd56 100644 --- a/docs/API/plots/plot_pipeline/index.html +++ b/docs/API/plots/plot_pipeline/index.html @@ -581,7 +581,7 @@

plot_pipeline


method plot_pipeline(show_params=True, branch=None, title=None, figsize=None, filename=None, display=True)
-
+

Plot a diagram of every estimator in a branch.

Parameters:
diff --git a/docs/API/plots/plot_prc/index.html b/docs/API/plots/plot_prc/index.html index f765ab121..dcd749950 100644 --- a/docs/API/plots/plot_prc/index.html +++ b/docs/API/plots/plot_prc/index.html @@ -581,9 +581,9 @@

plot_prc


method plot_prc(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the precision-recall curve. The legend shows the average precision (AP) score. -Only for binary classification tasks.

+ +

Plot the precision-recall curve. The legend shows the average +precision (AP) score. Only for binary classification tasks.

diff --git a/docs/API/plots/plot_probabilities/index.html b/docs/API/plots/plot_probabilities/index.html index c4f06c99e..481601b7c 100644 --- a/docs/API/plots/plot_probabilities/index.html +++ b/docs/API/plots/plot_probabilities/index.html @@ -582,8 +582,9 @@

plot_probabilities


method plot_probabilities(models=None, dataset="test", target=1,
                           title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the probability distribution of the classes in the target column. Only for classification tasks.

+ +

Plot the probability distribution of the classes in the target column. +Only for classification tasks.

Parameters:
diff --git a/docs/API/plots/plot_residuals/index.html b/docs/API/plots/plot_residuals/index.html index 498804d7f..d7f67ec5f 100644 --- a/docs/API/plots/plot_residuals/index.html +++ b/docs/API/plots/plot_residuals/index.html @@ -581,14 +581,14 @@

plot_residuals


method plot_residuals(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
+

The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the -horizontal axis. The gray, intersected line shows the identity line. This -plot can be useful to analyze the variance of the error of the regressor. -If the points are randomly dispersed around the horizontal axis, a linear -regression model is appropriate for the data; otherwise, a non-linear model -is more appropriate. Only for regression tasks.

+horizontal axis. The gray, intersected line shows the identity line. +This plot can be useful to analyze the variance of the error of the +regressor. If the points are randomly dispersed around the horizontal +axis, a linear regression model is appropriate for the data; otherwise, +a non-linear model is more appropriate. Only for regression tasks.

Parameters:
diff --git a/docs/API/plots/plot_results/index.html b/docs/API/plots/plot_results/index.html index a0c2bf57c..3aff77b6e 100644 --- a/docs/API/plots/plot_results/index.html +++ b/docs/API/plots/plot_results/index.html @@ -581,13 +581,12 @@

plot_results


method plot_results(models=None, metric=0, title=None, figsize=None, filename=None, display=True)
-
-

Plot of the model results after the evaluation. -If all models applied bagging, the plot is a boxplot. -If not, the plot is a barplot. Models are ordered based -on their score from the top down. The score is either the -mean_bagging or metric_test attribute of the model, -selected in that order.

+ +

Plot of the model results after the evaluation. If all models applied +bagging, the plot is a boxplot. If not, the plot is a barplot. Models +are ordered based on their score from the top down. The score is either +the mean_bagging or metric_test attribute of the model, selected in +that order.

Parameters:
diff --git a/docs/API/plots/plot_rfecv/index.html b/docs/API/plots/plot_rfecv/index.html index eabbb6878..f538f2fff 100644 --- a/docs/API/plots/plot_rfecv/index.html +++ b/docs/API/plots/plot_rfecv/index.html @@ -581,9 +581,10 @@

plot_rfecv


method plot_rfecv(title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every -subset of the dataset. Only available if RFECV was applied on the data.

+ +

Plot the RFECV results, i.e. the scores obtained by the estimator +fitted on every subset of the dataset. Only available if RFECV was +applied on the data.

Parameters:
diff --git a/docs/API/plots/plot_roc/index.html b/docs/API/plots/plot_roc/index.html index afd2deadf..40f3b9f1b 100644 --- a/docs/API/plots/plot_roc/index.html +++ b/docs/API/plots/plot_roc/index.html @@ -582,9 +582,9 @@

plot_roc


method plot_roc(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot the Receiver Operating Characteristics curve. The legend shows the Area Under -the ROC Curve (AUC) score. Only for binary classification tasks.

+ +

Plot the Receiver Operating Characteristics curve. The legend shows the +Area Under the ROC Curve (AUC) score. Only for binary classification tasks.

Parameters:
diff --git a/docs/API/plots/plot_scatter_matrix/index.html b/docs/API/plots/plot_scatter_matrix/index.html index 5c8da694b..af9e01296 100644 --- a/docs/API/plots/plot_scatter_matrix/index.html +++ b/docs/API/plots/plot_scatter_matrix/index.html @@ -582,7 +582,7 @@

plot_scatter_matrix


method plot_scatter_matrix(columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs)
-
+

Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot.

Parameters:
diff --git a/docs/API/plots/plot_successive_halving/index.html b/docs/API/plots/plot_successive_halving/index.html index 40af6333a..42ab02152 100644 --- a/docs/API/plots/plot_successive_halving/index.html +++ b/docs/API/plots/plot_successive_halving/index.html @@ -582,7 +582,7 @@

plot_successive_halving


method plot_successive_halving(models=None, metric=0, title=None,
                                figsize=(10, 6), filename=None, display=True)
-
+

Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using successive halving.

diff --git a/docs/API/plots/plot_threshold/index.html b/docs/API/plots/plot_threshold/index.html index 416571fe4..f17bcb3f0 100644 --- a/docs/API/plots/plot_threshold/index.html +++ b/docs/API/plots/plot_threshold/index.html @@ -582,8 +582,9 @@

plot_threshold


method plot_threshold(models=None, metric=None, dataset="test", steps=100,
                       title=None, figsize=(10, 6), filename=None, display=True)
-
-

Plot metric performances against threshold values. Only for binary classification tasks.

+ +

Plot metric performances against threshold values. Only for binary +classification tasks.

diff --git a/docs/API/plots/scatter_plot/index.html b/docs/API/plots/scatter_plot/index.html index f763791e6..56ca1862a 100644 --- a/docs/API/plots/scatter_plot/index.html +++ b/docs/API/plots/scatter_plot/index.html @@ -582,27 +582,28 @@

scatter_plot


method scatter_plot(models=None, index=None, feature=0, target=1,
                     title=None, figsize=(10, 6), filename=None, display=True, **kwargs)
-
-

Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and -the SHAP value of the same feature on the y-axis. This shows how the model -depends on the given feature, and is like a richer extension of the classical -partial dependence plots. Vertical dispersion of the data points represents -interaction effects. Read more about SHAP plots in the user guide.

+ +

Plot SHAP's scatter plot. Plots the value of the feature on the x-axis +and the SHAP value of the same feature on the y-axis. This shows how +the model depends on the given feature, and is like a richer extension +of the classical partial dependence plots. Vertical dispersion of the +data points represents interaction effects. Read more about SHAP plots +in the user guide.

Parameters:
Parameters: models: str, sequence or None, optional (default=None)
-Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
index: tuple, slice or None, optional (default=None)
-Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows -n until m. If None, it selects all rows in the test set. The scatter plot does -not support plotting a single sample. +Indices of the rows in the dataset to plot. If tuple (n, m), it selects +rows n until m. If None, it selects all rows in the test set. The scatter +plot does not support plotting a single sample.
feature: int or str, optional (default=0)
@@ -610,8 +611,8 @@

scatter_plot

target: int or str, optional (default=1)
-Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
title: str or None, optional (default=None)
diff --git a/docs/API/plots/waterfall_plot/index.html b/docs/API/plots/waterfall_plot/index.html index b3c1c9ceb..becc73224 100644 --- a/docs/API/plots/waterfall_plot/index.html +++ b/docs/API/plots/waterfall_plot/index.html @@ -582,7 +582,7 @@

waterfall_plot


method waterfall_plot(models=None, index=None, show=None, target=1,
                       title=None, figsize=None, filename=None, display=True)
-
+

Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model’s output. The waterfall plot diff --git a/docs/API/predicting/decision_function/index.html b/docs/API/predicting/decision_function/index.html index db29561fa..b6f909d75 100644 --- a/docs/API/predicting/decision_function/index.html +++ b/docs/API/predicting/decision_function/index.html @@ -581,7 +581,7 @@

decision_function


method decision_function(X, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called diff --git a/docs/API/predicting/predict/index.html b/docs/API/predicting/predict/index.html index c247e3c81..f1d054684 100644 --- a/docs/API/predicting/predict/index.html +++ b/docs/API/predicting/predict/index.html @@ -581,7 +581,7 @@

predict


method predict(X, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a diff --git a/docs/API/predicting/predict_log_proba/index.html b/docs/API/predicting/predict_log_proba/index.html index 85d25fd85..b8471ea4b 100644 --- a/docs/API/predicting/predict_log_proba/index.html +++ b/docs/API/predicting/predict_log_proba/index.html @@ -581,7 +581,7 @@

predict_log_proba


method predict_log_proba(X, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called diff --git a/docs/API/predicting/predict_proba/index.html b/docs/API/predicting/predict_proba/index.html index 1b6a6308a..cd656ce1f 100644 --- a/docs/API/predicting/predict_proba/index.html +++ b/docs/API/predicting/predict_proba/index.html @@ -581,7 +581,7 @@

predict_proba


method predict_proba(X, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from diff --git a/docs/API/predicting/score/index.html b/docs/API/predicting/score/index.html index 82f0352db..e79dbdb61 100644 --- a/docs/API/predicting/score/index.html +++ b/docs/API/predicting/score/index.html @@ -581,7 +581,7 @@

score


method score(X, y, sample_weights=None, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called diff --git a/docs/API/predicting/transform/index.html b/docs/API/predicting/transform/index.html index 4b4e9152b..be5830215 100644 --- a/docs/API/predicting/transform/index.html +++ b/docs/API/predicting/transform/index.html @@ -581,7 +581,7 @@

transform


method transform(X, y=None, pipeline=None, verbose=None) 
-
+

Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the pipeline parameter to customize diff --git a/docs/API/training/directclassifier/index.html b/docs/API/training/directclassifier/index.html index e50a495ba..2a9f612e9 100644 --- a/docs/API/training/directclassifier/index.html +++ b/docs/API/training/directclassifier/index.html @@ -995,7 +995,7 @@

Methods


method calibrate(**kwargs)
-
+

Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, @@ -1017,7 +1017,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1055,7 +1055,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1072,7 +1072,7 @@

Methods


method get_class_weight(dataset="train")
-
+

Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -1121,7 +1121,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1145,7 +1145,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1172,7 +1172,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1195,7 +1195,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1249,7 +1249,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1284,7 +1284,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/API/training/directregressor/index.html b/docs/API/training/directregressor/index.html index 30918f776..306244cf9 100644 --- a/docs/API/training/directregressor/index.html +++ b/docs/API/training/directregressor/index.html @@ -981,7 +981,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1019,7 +1019,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1059,7 +1059,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1083,7 +1083,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1110,7 +1110,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1133,7 +1133,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1175,7 +1175,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1210,7 +1210,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/API/training/successivehalvingclassifier/index.html b/docs/API/training/successivehalvingclassifier/index.html index eb4028de4..1ed7c9c74 100644 --- a/docs/API/training/successivehalvingclassifier/index.html +++ b/docs/API/training/successivehalvingclassifier/index.html @@ -1000,7 +1000,7 @@

Methods


method calibrate(**kwargs)
-
+

Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, @@ -1022,7 +1022,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1060,7 +1060,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1077,7 +1077,7 @@

Methods


method get_class_weight(dataset="train")
-
+

Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -1126,7 +1126,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1150,7 +1150,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1177,7 +1177,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1200,7 +1200,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1254,7 +1254,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1289,7 +1289,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/API/training/successivehalvingregressor/index.html b/docs/API/training/successivehalvingregressor/index.html index 858e7ec70..d2d160ecc 100644 --- a/docs/API/training/successivehalvingregressor/index.html +++ b/docs/API/training/successivehalvingregressor/index.html @@ -987,7 +987,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1025,7 +1025,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1065,7 +1065,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1089,7 +1089,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1116,7 +1116,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1139,7 +1139,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1181,7 +1181,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1216,7 +1216,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/API/training/trainsizingclassifier/index.html b/docs/API/training/trainsizingclassifier/index.html index 67ef262c2..423ca2648 100644 --- a/docs/API/training/trainsizingclassifier/index.html +++ b/docs/API/training/trainsizingclassifier/index.html @@ -1003,7 +1003,7 @@

Methods


method calibrate(**kwargs)
-
+

Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, @@ -1025,7 +1025,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1063,7 +1063,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1080,7 +1080,7 @@

Methods


method get_class_weight(dataset="train")
-
+

Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -1129,7 +1129,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1153,7 +1153,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1180,7 +1180,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1203,7 +1203,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1257,7 +1257,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1292,7 +1292,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/API/training/trainsizingregressor/index.html b/docs/API/training/trainsizingregressor/index.html index 27ae0a30b..9ae8b95c2 100644 --- a/docs/API/training/trainsizingregressor/index.html +++ b/docs/API/training/trainsizingregressor/index.html @@ -991,7 +991,7 @@

Methods


method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+

This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case.

@@ -1029,7 +1029,7 @@

Methods


method delete(models=None)
-
+

Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance.

@@ -1069,7 +1069,7 @@

Methods


method log(msg, level=0)
-
+

Write a message to the logger and print it to stdout.

@@ -1093,7 +1093,7 @@

Methods




method reset_predictions()
-
+

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


@@ -1120,7 +1120,7 @@

Methods


method save(filename=None, save_data=True)
-
+

Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False.

@@ -1143,7 +1143,7 @@

Methods


method scoring(metric=None, dataset="test", **kwargs)
-
+

Print all the models' scoring for a specific metric.

@@ -1185,7 +1185,7 @@

Methods


method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+

Add a Stacking instance to the models in the pipeline.

@@ -1220,7 +1220,7 @@

Methods


method voting(models=None, weights=None)
-
+

Add a Voting instance to the models in the pipeline.

diff --git a/docs/index.html b/docs/index.html index 8d32a572a..c059cc339 100644 --- a/docs/index.html +++ b/docs/index.html @@ -997,5 +997,5 @@ diff --git a/docs/search/search_index.json b/docs/search/search_index.json index f0ccbc947..e7e386c83 100644 --- a/docs/search/search_index.json +++ b/docs/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Automated Tool for Optimized Modelling A Python package for fast exploration of machine learning pipelines During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. Example steps taken by ATOM's pipeline: Data Cleaning Handle missing values Encode categorical features Detect and remove outliers Balance the training set Feature engineering Create new non-linear features Remove multi-collinear features Remove features with too low variance Select the most promising features Train and validate multiple models Select hyperparameters using a Bayesian Optimization approach Train and test the models on the provided data Assess the robustness of the output using a bagging algorithm Analyze the results Get the model scores on various metrics Make plots to compare the model performances Figure 1. Diagram of the possible steps taken by ATOM. Release history Version 4.4.0 The drop method now allows the user to drop columns as part of the pipeline. It is now possible to add data transformations as function to the pipeline through the apply method. Added the status method to save an overview of atom's branches and models to the logger. Improved the output messages for the Imputer class. The dataset's columns can now be called directly from atom. The distribution and plot_distribution methods now ignore missing values instead of raising an exception. Fixed a bug where transformations failed when columns were added after initializing the pipeline. Fixed a bug where the Cleaner class didn't drop columns with only missing values for minimum_cardinality=True . Fixed a bug where the winning model wasn't displayed correctly. Refactored the way transformers are added or removed from predicting methods. Improved documentation. Version 4.3.0 Possibility to add custom transformers to the pipeline. The export_pipeline utility method exports atom's current pipeline to a sklearn object. Use AutoML to automate the search for an optimized pipeline. New magic methods makes atom behave similarly to sklearn's Pipeline . All training approaches can now be combined in the same atom instance. New plot_scatter_matrix , plot_distribution and plot_qq for data inspection. Complete rework of all the shap plots to be consistent with their new API. Improvements for the Scaler and Pruner classes. The acronym for custom models now defaults to the capital letters in the class' __name__. Possibility to apply transformations on only a subset of the columns. Plots and methods now accept winner as model name. Fixed a bug where custom metrics didn't show the correct name. Fixed a bug where timers were not displayed correctly. Further compatibility with deep learning datasets. Large refactoring for performance optimization. Cleaner output of messages to the logger. Plots no longer show a default title. Added the AutoML example notebook. Minor bug fixes. Version 4.2.1 Bug fix where there was memory leakage in successive halving and train sizing pipelines. The XGBoost , LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models , e.g. pip install -U atom-ml[models] . Improved documentation. Version 4.2.0 Possibility to add custom models to the pipeline using ATOMModel . Compatibility with deep learning models. New branch system for different data pipelines. Read more in the user guide . Use the canvas contextmanager to draw multiple plots in one figure. New voting and stacking ensemble techniques. New get_class_weight utility method. New Sequential Feature Selection strategy for the FeatureSelector . Added the sample_weight parameter to the score method. New ways to initialize the data in the training instances. The n_rows parameter in ATOMLoader is deprecated in favour of the new data input formats. The test_size parameter now also allows integer values. Renamed categories to classes to be consistent with sklearn's API. The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset. Possibility to add custom parameters to an estimator's fit method through est_params . Successive halving and train sizing now both allow subsequent runs from atom without losing previous information. Bug fix where ATOMLoader wouldn't encode the target column during transformation. Added the Deep learning , Ensembles and Utilities example notebooks. Compatibility with python 3.9 . Version 4.1.0 Added the est_params parameter to customize the parameters passed to every model's estimator. Following skopt's API, the n_random_starts parameter is deprecated in favour of n_initial_points . The Balancer class now allows you to use any of the strategies from imblearn . New utility attributes to inspect the dataset. Four new models: CatNB , CNB , ARD and RNN . Added the models section to the documentation. Small changes in log outputs. Bug fixes and performance improvements. Version 4.0.1 Bug fix where the DFS strategy in FeatureGenerator was not deterministic for a fixed random state. Bug fix where subsequent runs with the same metric failed. Added the license file to the package's installer. Typo fixes in documentation. Version 4.0.0 Bayesian optimization package changed from GpyOpt to skopt . Complete revision of the model's hyperparameters. Four SHAP plots can now be called directly from an ATOM pipeline. Two new plots for regression tasks. New plot_pipeline and pipeline attribute to access all transformers. Possibility to determine transformer parameters per method. New calibration method and plot . Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs). Implementation of multi-metric runs. Possibility to choose which metric to plot. Early stopping for models that allow in-training evaluation. Added the ATOMLoader function to load saved atom instances and directly apply all data transformations. The \"remove\" strategy in the data cleaning parameters is deprecated in favour of \"drop\". Implemented the DFS strategy in FeatureGenerator . All training classes now inherit from BaseEstimator. Added multiple new example notebooks. Tests coverage up to 100%. Completely new documentation page. Bug fixes and performance improvements. Content Getting started User guide API ATOM ATOMClassifier ATOMRegressor ATOMLoader ATOMModel Data cleaning Scaler Cleaner Imputer Encoder Pruner Balancer Feature engineering FeatureGenerator FeatureSelector Training Direct DirectClassifier DirectRegressor SuccessiveHalving SuccessiveHalvingClassifier SuccessiveHalvingClassifier TrainSizing TrainSizingClassifier TrainSizingRegressor Models Gaussian Process Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli Naive Bayes Categorical Naive Bayes Complement Naive Bayes Ordinary Least Squares Ridge Lasso Elastic Net Bayesian Ridge Automated Relevance Determination Logistic Regression Linear Discriminant Analysis Quadratic Discriminant Analysis K-Nearest Neighbors Radius Nearest Neighbors Decision Tree Bagging Extra-Trees Random Forest AdaBoost Gradient Boosting Machine XGBoost LightGBM CatBoost Linear-SVM Kernel-SVM Passive Aggressive Stochastic Gradient Descent Multi-layer Perceptron Predicting transform predict predict_proba predict_log_proba decision_function score Plots plot_correlation plot_scatter_matrix plot_distribution plot_qq plot_pipeline plot_pca plot_components plot_rfecv plot_successive_halving plot_learning_curve plot_results plot_bo plot_evals plot_roc plot_prc plot_permutation_importance plot_feature_importance plot_partial_dependence plot_errors plot_residuals plot_confusion_matrix plot_threshold plot_probabilities plot_calibration plot_gains plot_lift bar_plot beeswarm_plot decision_plot force_plot heatmap_plot scatter_plot waterfall_plot Examples AutoML Binary classification Calibration Deep learning Early stopping Ensembles Feature engineering Imbalanced datasets Multiclass classification Multi-metric runs Regression Successive halving Train sizing Utilities FAQ Dependencies License","title":"Home"},{"location":"#automated-tool-for-optimized-modelling","text":"","title":"Automated Tool for Optimized Modelling"},{"location":"#a-python-package-for-fast-exploration-of-machine-learning-pipelines","text":"During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. Example steps taken by ATOM's pipeline: Data Cleaning Handle missing values Encode categorical features Detect and remove outliers Balance the training set Feature engineering Create new non-linear features Remove multi-collinear features Remove features with too low variance Select the most promising features Train and validate multiple models Select hyperparameters using a Bayesian Optimization approach Train and test the models on the provided data Assess the robustness of the output using a bagging algorithm Analyze the results Get the model scores on various metrics Make plots to compare the model performances Figure 1. Diagram of the possible steps taken by ATOM.","title":"A Python package for fast exploration of machine learning pipelines"},{"location":"#release-history","text":"","title":"Release history"},{"location":"#version-440","text":"The drop method now allows the user to drop columns as part of the pipeline. It is now possible to add data transformations as function to the pipeline through the apply method. Added the status method to save an overview of atom's branches and models to the logger. Improved the output messages for the Imputer class. The dataset's columns can now be called directly from atom. The distribution and plot_distribution methods now ignore missing values instead of raising an exception. Fixed a bug where transformations failed when columns were added after initializing the pipeline. Fixed a bug where the Cleaner class didn't drop columns with only missing values for minimum_cardinality=True . Fixed a bug where the winning model wasn't displayed correctly. Refactored the way transformers are added or removed from predicting methods. Improved documentation.","title":"Version 4.4.0"},{"location":"#version-430","text":"Possibility to add custom transformers to the pipeline. The export_pipeline utility method exports atom's current pipeline to a sklearn object. Use AutoML to automate the search for an optimized pipeline. New magic methods makes atom behave similarly to sklearn's Pipeline . All training approaches can now be combined in the same atom instance. New plot_scatter_matrix , plot_distribution and plot_qq for data inspection. Complete rework of all the shap plots to be consistent with their new API. Improvements for the Scaler and Pruner classes. The acronym for custom models now defaults to the capital letters in the class' __name__. Possibility to apply transformations on only a subset of the columns. Plots and methods now accept winner as model name. Fixed a bug where custom metrics didn't show the correct name. Fixed a bug where timers were not displayed correctly. Further compatibility with deep learning datasets. Large refactoring for performance optimization. Cleaner output of messages to the logger. Plots no longer show a default title. Added the AutoML example notebook. Minor bug fixes.","title":"Version 4.3.0"},{"location":"#version-421","text":"Bug fix where there was memory leakage in successive halving and train sizing pipelines. The XGBoost , LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models , e.g. pip install -U atom-ml[models] . Improved documentation.","title":"Version 4.2.1"},{"location":"#version-420","text":"Possibility to add custom models to the pipeline using ATOMModel . Compatibility with deep learning models. New branch system for different data pipelines. Read more in the user guide . Use the canvas contextmanager to draw multiple plots in one figure. New voting and stacking ensemble techniques. New get_class_weight utility method. New Sequential Feature Selection strategy for the FeatureSelector . Added the sample_weight parameter to the score method. New ways to initialize the data in the training instances. The n_rows parameter in ATOMLoader is deprecated in favour of the new data input formats. The test_size parameter now also allows integer values. Renamed categories to classes to be consistent with sklearn's API. The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset. Possibility to add custom parameters to an estimator's fit method through est_params . Successive halving and train sizing now both allow subsequent runs from atom without losing previous information. Bug fix where ATOMLoader wouldn't encode the target column during transformation. Added the Deep learning , Ensembles and Utilities example notebooks. Compatibility with python 3.9 .","title":"Version 4.2.0"},{"location":"#version-410","text":"Added the est_params parameter to customize the parameters passed to every model's estimator. Following skopt's API, the n_random_starts parameter is deprecated in favour of n_initial_points . The Balancer class now allows you to use any of the strategies from imblearn . New utility attributes to inspect the dataset. Four new models: CatNB , CNB , ARD and RNN . Added the models section to the documentation. Small changes in log outputs. Bug fixes and performance improvements.","title":"Version 4.1.0"},{"location":"#version-401","text":"Bug fix where the DFS strategy in FeatureGenerator was not deterministic for a fixed random state. Bug fix where subsequent runs with the same metric failed. Added the license file to the package's installer. Typo fixes in documentation.","title":"Version 4.0.1"},{"location":"#version-400","text":"Bayesian optimization package changed from GpyOpt to skopt . Complete revision of the model's hyperparameters. Four SHAP plots can now be called directly from an ATOM pipeline. Two new plots for regression tasks. New plot_pipeline and pipeline attribute to access all transformers. Possibility to determine transformer parameters per method. New calibration method and plot . Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs). Implementation of multi-metric runs. Possibility to choose which metric to plot. Early stopping for models that allow in-training evaluation. Added the ATOMLoader function to load saved atom instances and directly apply all data transformations. The \"remove\" strategy in the data cleaning parameters is deprecated in favour of \"drop\". Implemented the DFS strategy in FeatureGenerator . All training classes now inherit from BaseEstimator. Added multiple new example notebooks. Tests coverage up to 100%. Completely new documentation page. Bug fixes and performance improvements.","title":"Version 4.0.0"},{"location":"#content","text":"Getting started User guide API ATOM ATOMClassifier ATOMRegressor ATOMLoader ATOMModel Data cleaning Scaler Cleaner Imputer Encoder Pruner Balancer Feature engineering FeatureGenerator FeatureSelector Training Direct DirectClassifier DirectRegressor SuccessiveHalving SuccessiveHalvingClassifier SuccessiveHalvingClassifier TrainSizing TrainSizingClassifier TrainSizingRegressor Models Gaussian Process Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli Naive Bayes Categorical Naive Bayes Complement Naive Bayes Ordinary Least Squares Ridge Lasso Elastic Net Bayesian Ridge Automated Relevance Determination Logistic Regression Linear Discriminant Analysis Quadratic Discriminant Analysis K-Nearest Neighbors Radius Nearest Neighbors Decision Tree Bagging Extra-Trees Random Forest AdaBoost Gradient Boosting Machine XGBoost LightGBM CatBoost Linear-SVM Kernel-SVM Passive Aggressive Stochastic Gradient Descent Multi-layer Perceptron Predicting transform predict predict_proba predict_log_proba decision_function score Plots plot_correlation plot_scatter_matrix plot_distribution plot_qq plot_pipeline plot_pca plot_components plot_rfecv plot_successive_halving plot_learning_curve plot_results plot_bo plot_evals plot_roc plot_prc plot_permutation_importance plot_feature_importance plot_partial_dependence plot_errors plot_residuals plot_confusion_matrix plot_threshold plot_probabilities plot_calibration plot_gains plot_lift bar_plot beeswarm_plot decision_plot force_plot heatmap_plot scatter_plot waterfall_plot Examples AutoML Binary classification Calibration Deep learning Early stopping Ensembles Feature engineering Imbalanced datasets Multiclass classification Multi-metric runs Regression Successive halving Train sizing Utilities FAQ Dependencies License","title":"Content"},{"location":"dependencies/","text":"Python As of the moment, ATOM supports Python 3.6 , 3.7 , 3.8 and 3.9 . Packages ATOM is built on top of several existing Python libraries. The required packages are necessary for it's correct functioning. Additionally, you can install some optional packages to use machine learning estimators not provided by sklearn. Required numpy (>=1.19.5) scipy (>=1.4.1) pandas (>=1.0.3) dill (>=0.3.3) tqdm (>=4.35.0) joblib (>=0.16.0) typeguard (>=2.7.1) tabulate (>=0.8.6) scikit-learn (>=0.24) scikit-optimize (>=0.7.4) tpot (>=0.11.7) pandas-profiling (>=2.3.0) category-encoders (>=2.1.0) imbalanced-learn (>=0.5.0) featuretools (>=0.17.0) gplearn (>=0.4.1) matplotlib (>=3.3.0) seaborn (>=0.9.0) shap (>=0.38.1) Optional xgboost (>=0.90) lightgbm (>=2.3.0) catboost (>=0.19.1) Support ATOM recognizes the support from JetBrains by providing core project contributors with a set of developer tools free of charge.","title":"Dependencies"},{"location":"dependencies/#python","text":"As of the moment, ATOM supports Python 3.6 , 3.7 , 3.8 and 3.9 .","title":"Python"},{"location":"dependencies/#packages","text":"ATOM is built on top of several existing Python libraries. The required packages are necessary for it's correct functioning. Additionally, you can install some optional packages to use machine learning estimators not provided by sklearn.","title":"Packages"},{"location":"dependencies/#required","text":"numpy (>=1.19.5) scipy (>=1.4.1) pandas (>=1.0.3) dill (>=0.3.3) tqdm (>=4.35.0) joblib (>=0.16.0) typeguard (>=2.7.1) tabulate (>=0.8.6) scikit-learn (>=0.24) scikit-optimize (>=0.7.4) tpot (>=0.11.7) pandas-profiling (>=2.3.0) category-encoders (>=2.1.0) imbalanced-learn (>=0.5.0) featuretools (>=0.17.0) gplearn (>=0.4.1) matplotlib (>=3.3.0) seaborn (>=0.9.0) shap (>=0.38.1)","title":"Required"},{"location":"dependencies/#optional","text":"xgboost (>=0.90) lightgbm (>=2.3.0) catboost (>=0.19.1)","title":"Optional"},{"location":"dependencies/#support","text":"ATOM recognizes the support from JetBrains by providing core project contributors with a set of developer tools free of charge.","title":"Support"},{"location":"faq/","text":"Frequently asked questions There already is an atom text editor. Does this has anything to do with that? How does ATOM relate to AutoML? Is it possible to run deep learning models? Can I run atom's methods on just a subset of the columns? How can I compare the same model on different datasets? Can I train models through atom using a GPU? How are numerical and categorical columns differentiated? Can I run unsupervised learning pipelines? Is there a way to plot multiple models in the same shap plot? Can I merge a sklearn pipeline with atom? Is it possible to initialize atom with an existing train and test set? There already is an atom text editor. Does this has anything to do with that? There is, indeed, a text editor with the same name and a similar logo. Is this a shameless copy? No. When I started the project, I didn't know about the text editor, and it doesn't require much thinking to come up with the idea of replacing the letter O of the word atom with the image of an atom. How does ATOM relate to AutoML? ATOM is not an AutoML tool since it does not automate the search for an optimal pipeline like well known AutoML tools such as auto-sklearn or TPOT do. Instead, ATOM helps the user find the optimal pipeline himself. One of the goals of this package is to help data scientists produce explainable pipelines, and using an AutoML black box function would impede that. That said, it is possible to integrate a TPOT pipeline with atom through the automl method. Is it possible to run deep learning models? Yes. Deep learning models can be added as custom models to the pipeline as long as they follow sklearn's API . If the dataset is 2-dimensional, everything should work normally. If the dataset has more than 2 dimensions (referred in the documentation as deep learning datasets, often the case for images or text embeddings), only a subset of atom's methods will work. For more information, see the deep learning section of the user guide. Can I run atom's methods on just a subset of the columns? Yes, all data cleaning and feature engineering methods accept a columns parameter to only transform the selected features. For example, to only impute the numerical columns in the dataset we could type atom.impute(strat_num=\"mean\", columns=atom.numerical) . The parameter accepts column names, column indices or a slice object. How can I compare the same model on different datasets? In many occasions you might want to test how a model performs on datasets processed with different pipelines. For this, atom has the branch system . Create a new branch for every new pipeline you want to test and use the plot methods to compare all models, independent of the branch it was trained on. Can I train models through atom using a GPU? ATOM doesn't fit the models himself. The underlying models' package does. Since the majority of predefined models are implemented through sklearn and sklearn works on CPU only, they can not be trained on any GPU. If you are using a custom model whose package, Keras for example, allows GPU implementation and the settings or model parameters are tuned to do so, the model will train on the GPU like it would do outside atom. How are numerical and categorical columns differentiated? The columns are separated using pandas' select_dtypes method for dataframes. Numerical columns are selected using include=\"number\" whereas categorical columns are selected using exclude=\"number\" . Can I run unsupervised learning pipelines? No. As for now, ATOM only supports supervised machine learning pipelines. However, various unsupervised algorithms can be chosen as strategy in the Pruner class to detect and remove outliers from the dataset. Is there a way to plot multiple models in the same shap plot? No. Unfortunately, there is no way to plot multiple models in the same shap plot since the plots are made by the SHAP package and passed as matplotlib.axes objects to atom. This means that it's not within the reach of this package to implement such an utility. Can I merge a sklearn pipeline with atom? Yes. Like any other transformer, it is possible to add a sklearn pipeline to atom using the add method. Every transformer in the pipeline is merged independently. The pipeline is not allowed to end with a model since atom manages its own models. If that is the case, add the pipeline using atom.add(pipeline[:-1]) . Is it possible to initialize atom with an existing train and test set? Yes. If you already have a separated train and test set you can initialize atom in two ways: atom = ATOMClassifier(train, test) atom = ATOMClassifier((X_train, y_train), (X_test, y_test)) Make sure the train and test size have the same number of columns. If initialized like this, the test_size parameter is ignored.","title":"FAQ"},{"location":"faq/#frequently-asked-questions","text":"There already is an atom text editor. Does this has anything to do with that? How does ATOM relate to AutoML? Is it possible to run deep learning models? Can I run atom's methods on just a subset of the columns? How can I compare the same model on different datasets? Can I train models through atom using a GPU? How are numerical and categorical columns differentiated? Can I run unsupervised learning pipelines? Is there a way to plot multiple models in the same shap plot? Can I merge a sklearn pipeline with atom? Is it possible to initialize atom with an existing train and test set?","title":"Frequently asked questions"},{"location":"faq/#there-already-is-an-atom-text-editor-does-this-has-anything-to-do-with-that","text":"There is, indeed, a text editor with the same name and a similar logo. Is this a shameless copy? No. When I started the project, I didn't know about the text editor, and it doesn't require much thinking to come up with the idea of replacing the letter O of the word atom with the image of an atom.","title":"There already is an atom text editor. Does this has anything to do with that?"},{"location":"faq/#how-does-atom-relate-to-automl","text":"ATOM is not an AutoML tool since it does not automate the search for an optimal pipeline like well known AutoML tools such as auto-sklearn or TPOT do. Instead, ATOM helps the user find the optimal pipeline himself. One of the goals of this package is to help data scientists produce explainable pipelines, and using an AutoML black box function would impede that. That said, it is possible to integrate a TPOT pipeline with atom through the automl method.","title":"How does ATOM relate to AutoML?"},{"location":"faq/#is-it-possible-to-run-deep-learning-models","text":"Yes. Deep learning models can be added as custom models to the pipeline as long as they follow sklearn's API . If the dataset is 2-dimensional, everything should work normally. If the dataset has more than 2 dimensions (referred in the documentation as deep learning datasets, often the case for images or text embeddings), only a subset of atom's methods will work. For more information, see the deep learning section of the user guide.","title":"Is it possible to run deep learning models?"},{"location":"faq/#can-i-run-atoms-methods-on-just-a-subset-of-the-columns","text":"Yes, all data cleaning and feature engineering methods accept a columns parameter to only transform the selected features. For example, to only impute the numerical columns in the dataset we could type atom.impute(strat_num=\"mean\", columns=atom.numerical) . The parameter accepts column names, column indices or a slice object.","title":"Can I run atom's methods on just a subset of the columns?"},{"location":"faq/#how-can-i-compare-the-same-model-on-different-datasets","text":"In many occasions you might want to test how a model performs on datasets processed with different pipelines. For this, atom has the branch system . Create a new branch for every new pipeline you want to test and use the plot methods to compare all models, independent of the branch it was trained on.","title":"How can I compare the same model on different datasets?"},{"location":"faq/#can-i-train-models-through-atom-using-a-gpu","text":"ATOM doesn't fit the models himself. The underlying models' package does. Since the majority of predefined models are implemented through sklearn and sklearn works on CPU only, they can not be trained on any GPU. If you are using a custom model whose package, Keras for example, allows GPU implementation and the settings or model parameters are tuned to do so, the model will train on the GPU like it would do outside atom.","title":"Can I train models through atom using a GPU?"},{"location":"faq/#how-are-numerical-and-categorical-columns-differentiated","text":"The columns are separated using pandas' select_dtypes method for dataframes. Numerical columns are selected using include=\"number\" whereas categorical columns are selected using exclude=\"number\" .","title":"How are numerical and categorical columns differentiated?"},{"location":"faq/#can-i-run-unsupervised-learning-pipelines","text":"No. As for now, ATOM only supports supervised machine learning pipelines. However, various unsupervised algorithms can be chosen as strategy in the Pruner class to detect and remove outliers from the dataset.","title":"Can I run unsupervised learning pipelines?"},{"location":"faq/#is-there-a-way-to-plot-multiple-models-in-the-same-shap-plot","text":"No. Unfortunately, there is no way to plot multiple models in the same shap plot since the plots are made by the SHAP package and passed as matplotlib.axes objects to atom. This means that it's not within the reach of this package to implement such an utility.","title":"Is there a way to plot multiple models in the same shap plot?"},{"location":"faq/#can-i-merge-a-sklearn-pipeline-with-atom","text":"Yes. Like any other transformer, it is possible to add a sklearn pipeline to atom using the add method. Every transformer in the pipeline is merged independently. The pipeline is not allowed to end with a model since atom manages its own models. If that is the case, add the pipeline using atom.add(pipeline[:-1]) .","title":"Can I merge a sklearn pipeline with atom?"},{"location":"faq/#is-it-possible-to-initialize-atom-with-an-existing-train-and-test-set","text":"Yes. If you already have a separated train and test set you can initialize atom in two ways: atom = ATOMClassifier(train, test) atom = ATOMClassifier((X_train, y_train), (X_test, y_test)) Make sure the train and test size have the same number of columns. If initialized like this, the test_size parameter is ignored.","title":"Is it possible to initialize atom with an existing train and test set?"},{"location":"getting_started/","text":"Installation Install ATOM's newest release easily via pip : $ pip install -U atom-ml or via conda : $ conda install -c conda-forge atom-ml Note that using these commands also install/update all required dependencies . To install the optional dependencies , add [models] after the package's name. $ pip install -U atom-ml[models] Note Since atom was already taken, download the package under the name atom-ml ! Usage Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use: from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y) atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) ATOM has multiple data cleaning methods to help you prepare the data for modelling: atom.impute(strat_num=\"knn\", strat_cat=\"most_frequent\", min_frac_rows=0.1) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.05) atom.feature_selection(strategy=\"PCA\", n_features=12) Train and evaluate the models you want to compare: atom.run( models=[\"LR\", \"LDA\", \"XGB\", \"lSVM\"], metric=\"f1\", n_calls=25, n_initial_points=10, bagging=4, ) Make plots to analyze the results: atom.plot_results(figsize=(9, 6), filename=\"bagging_results.png\") atom.lda.plot_confusion_matrix(normalize=True, filename=\"cm.png\")","title":"Getting started"},{"location":"getting_started/#installation","text":"Install ATOM's newest release easily via pip : $ pip install -U atom-ml or via conda : $ conda install -c conda-forge atom-ml Note that using these commands also install/update all required dependencies . To install the optional dependencies , add [models] after the package's name. $ pip install -U atom-ml[models] Note Since atom was already taken, download the package under the name atom-ml !","title":"Installation"},{"location":"getting_started/#usage","text":"Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use: from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y) atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) ATOM has multiple data cleaning methods to help you prepare the data for modelling: atom.impute(strat_num=\"knn\", strat_cat=\"most_frequent\", min_frac_rows=0.1) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.05) atom.feature_selection(strategy=\"PCA\", n_features=12) Train and evaluate the models you want to compare: atom.run( models=[\"LR\", \"LDA\", \"XGB\", \"lSVM\"], metric=\"f1\", n_calls=25, n_initial_points=10, bagging=4, ) Make plots to analyze the results: atom.plot_results(figsize=(9, 6), filename=\"bagging_results.png\") atom.lda.plot_confusion_matrix(normalize=True, filename=\"cm.png\")","title":"Usage"},{"location":"license/","text":"MIT License Copyright (c) 2020 tvdboom Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"License"},{"location":"license/#mit-license","text":"Copyright (c) 2020 tvdboom Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"user_guide/","text":"Introduction There is no magic formula in data science that can tell us which type of machine learning estimator in combination with which pipeline will perform best for a given raw dataset. Different models are better suited for different types of data and different types of problems. At best, you can follow some rough guide on how to approach problems with regard to which model to try on your data, but these are incomplete at best. During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? Although best practices tell us to start with a simple model and build up to more complicated ones, many data scientists just use the model best known to them in order to avoid the aforementioned problems. This can result in poor performance (because the model is just not the right one for the task) or in inefficient management of time and computing resources (because a simpler/faster model could have achieved a similar performance). ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. It is important to realize that ATOM is not here to replace all the work a data scientist has to do before getting his model into production. ATOM doesn't spit out production-ready models just by tuning some parameters in its API. After helping you determine the right pipeline, you will most probably need to fine-tune it using use-case specific features and data cleaning steps in order to achieve maximum performance. Nomenclature In this documentation we will consistently use terms to refer to certain concepts related to this package. atom : Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name). ATOM : Refers to this package. branch : Collection of estimators in the pipeline fitted to a specific dataset. See the branches section. BO : Bayesian optimization algorithm used for hyperparameter optimization. categorical columns : Refers to all non-numerical columns. class : Unique value in a column, e.g. a binary classifier has 2 classes in the target column. estimator : An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. missing values : Values in the missing attribute. model : Instance of a model in the pipeline. outlier : Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy. outlier value : Value that lies further than 3 times the standard_deviation away from the mean of its column (|z-score| > 3). pipeline : All the content in atom for a specific branch. predictor : An estimator implementing a predict method. This encompasses all classifiers and regressors. scorer : A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation . sequence : A one-dimensional array of variable type list , tuple , np.ndarray or pd.Series . target : Name of the dependent variable, passed as y to an estimator's fit method. task : One of the three supervised machine learning approaches that ATOM supports: binary classification multiclass classification regression trainer : Instance of a class that train and evaluate the models (implement a run method). The following classes are considered trainers: ATOMClassifier ATOMRegressor DirectClassifier DirectRegressor SuccessiveHalvingClassifier SuccessiveHavingRegressor TrainSizingClassifier TrainSizingRegressor transformer : An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes. First steps You can quickly install atom using pip or conda , see the installation guide . ATOM contains a variety of classes to perform data cleaning, feature engineering, model training, plotting and much more. The easiest way to use everything ATOM has to offer is through one of the main classes: ATOMClassifier for binary or multiclass classification tasks. ATOMRegressor for regression tasks. These two classes are convenient wrappers for the whole machine learning pipeline. Like a sklearn Pipeline , they assemble several steps that can be cross-validated together while setting different parameters. There are some important differences with sklearn's API: atom is initialized with the data you want to manipulate. This data can be accessed at any moment through atom's data attributes . The classes in ATOM's API are reached through atom's methods. For example, calling the encode method will initialize an Encoder instance, fit it on the training set and transform the whole dataset. The transformations are applied immediately after calling the method (there is no fit method). This approach gives the user a clearer overview and more control over every step in the pipeline. Let's get started with an example! First, initialize atom and provide it the data you want to use. You can either input a dataset and let ATOM split the train and test set or provide a train and test set already split. Note that if a dataframe is provided, the indices are reset by atom. atom = ATOMClassifier(X, y, test_size=0.25) Apply data cleaning methods through the class. For example, calling the impute method will handle all missing values in the dataset. atom.impute(strat_num=\"median\", strat_cat=\"most_frequent\", min_frac_rows=0.1) Select the best hyperparameters and fit a Random Forest and AdaBoost model. atom.run([\"RF\", \"AdaB\"], metric=\"accuracy\", n_calls=25, n_initial_points=10) Analyze the results: atom.feature_importances(show=10, filename=\"feature_importance_plot\") atom.plot_prc(title=\"Precision-recall curve comparison plot\") Data pipelines It may happen that you want to compare how a model performs on different datasets. For example, on one dataset balanced with an undersampling strategy and the other with an oversampling strategy. For this, atom has data pipelines. Branches Data pipelines manage separate paths atom's dataset can take. The paths are called branches and can be accessed through the branch attribute. Calling it will show the branches in the pipeline. The current branch is indicated with ! . A branch contains a specific dataset, and the transformers it took to arrive to that dataset from the one atom initialized with. Accessing data attributes such as atom.dataset will return the data in the current branch. Use the pipeline attribute to see the estimators in the branch. All data cleaning, feature engineering and trainers called will use the dataset in the current branch. This means that models are trained and validated on the data in that branch. Don't change the data in a branch after fitting a model, this can cause unexpected model behaviour. Instead, create a new branch for every unique model pipeline. By default, atom starts with one branch called \"master\". To start a new branch, set a new name to the property, e.g. atom.branch = \"new_branch\" . This will start a new branch from the current one. To create a branch from any other branch type \"_from_\" between the new name and the branch from which to split, e.g. atom.branch = \"branch2_from_branch1\" will create branch \"branch2\" from branch \"branch1\". To switch between existing branches, just type the name of the desired branch, e.g. atom.branch = \"master\" to go back to the main branch. Note that every branch contains a unique copy of the whole dataset! Creating many branches can cause memory issues for large datasets. You can delete a branch either deleting the attribute, e.g. del atom.branch , or using the delete method, e.g. atom.branch.delete() . A branch can only be deleted if no models were trained on its dataset. Use atom.branch.status() to print a list of the transformers and models in the branch. See the Imbalanced datasets or Feature engineering examples for branching use cases. Warning Always create a new branch if you want to change the dataset after fitting a model! Not doing so can cause unexpected model behaviour. Data transformations Performing data transformations is a common requirement of many datasets before they are ready to be ingested by a model. ATOM provides various classes to apply data cleaning and feature engineering transformations to the data. This tooling should be able to help you apply most of the typically needed transformations to get the data ready for modelling. For further fine-tuning, it is also possible to pre-process the data using custom transformers. They can be added to the pipeline using atom's add method. Remember that all transformations are only applied to the dataset in the current branch. AutoML Automated machine learning (AutoML) automates the selection, composition and parameterization of machine learning pipelines. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms. ATOM uses the TPOT package for AutoML optimization. TPOT uses a genetic algorithm to intelligently explore thousands of possible pipelines in order to find the best one for your data. Such an algorithm can be started through the automl method. The resulting data transformers and final estimator are merged with atom's pipeline (check the pipeline and models attributes after the method finishes running). Warning AutoML algorithms aren't intended to run for only a few minutes. If left to its default parameters, the method can take a very long time to finish! Data cleaning More often than not, you need to do some data cleaning before fitting your dataset to a model. Usually, this involves importing different libraries and writing many lines of code. Since ATOM is all about fast exploration and experimentation, it provides various data cleaning classes to apply the most common transformations fast and easy. Note All of atom's data cleaning methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.scale(verbose=2) . Note Like the add method, the data cleaning methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.scale(columns=[0, 1]) . Scaling the feature set Standardization of a dataset is a common requirement for many machine learning estimators; they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with zero mean and unit variance). The Scaler class let you quickly scale atom's dataset using one of sklearn's scalers. It can be accessed from atom through the scale method. Standard data cleaning There are many data cleaning steps that are useful to perform on any dataset before modelling. These are general rules that apply almost on every use-case and every task. The Cleaner class is a convenient tool to apply such steps. It can be accessed from atom through the clean method. Use the class' parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. Imputing missing values For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. Such datasets however are incompatible with ATOM's models which assume that all values in an array are numerical, and that all have and hold meaning. The Imputer class handles missing values in the dataset by either dropping or imputing the value. It can be accessed from atom through the impute method. Tip Use atom's missing attribute to check the amount of missing values per feature. Encoding categorical features Many datasets will contain categorical features. Their variables are typically stored as text values which represent various traits. Some examples include color (\u201cRed\u201d, \u201cYellow\u201d, \u201cBlue\u201d), size (\u201cSmall\u201d, \u201cMedium\u201d, \u201cLarge\u201d) or geographic designations (city or country). Regardless of what the value is used for, the challenge is determining how to use this data in the analysis. ATOM's models don't support direct manipulation of this kind of data. Use the Encoder class to encode categorical features to numerical values. It can be accessed from atom through the encode method. Tip Use atom's categorical attribute for a list of the categorical features in the dataset. Handling outliers When modeling, it is important to clean the data sample to ensure that the observations best represent the problem. Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers. Often, machine learning modeling and model skill in general can be improved by understanding and even removing these outlier samples. The Pruner class offers 5 different strategies to detect outliers (described hereunder). It can be accessed from atom through the prune method. Tip Use atom's outliers attribute to check the number of outliers per column. z-score The z-score of a value in the dataset is defined as the number of standard deviations by which the value is above or below the mean of the column. Values above or below a certain threshold (specified with the parameter max_sigma ) are considered outliers. Note that, contrary to the rest of the strategies, this approach selects outlier values, not outlier samples! Because of this, it is possible to replace the outlier value instead of simply dropping the sample. Isolation Forest Uses a tree-based anomaly detection algorithm. It is based on modeling the normal data in such a way as to isolate anomalies that are both few and different in the feature space. Read more in sklearn's documentation . Elliptic Envelope If the input variables have a Gaussian distribution, then simple statistical methods can be used to detect outliers. For example, if the dataset has two input variables and both are Gaussian, then the feature space forms a multi-dimensional Gaussian and knowledge of this distribution can be used to identify values far from the distribution. This approach can be generalized by defining a hypersphere (ellipsoid) that covers the normal data, and data that falls outside this shape is considered an outlier. Read more in sklearn's documentation . Local Outlier Factor A simple approach to identifying outliers is to locate those examples that are far from the other examples in the feature space. This can work well for feature spaces with low dimensionality (few features) but becomes less reliable as the number of features is increased. This is referred to as the curse of dimensionality. The local outlier factor is a technique that attempts to harness the idea of nearest neighbors for outlier detection. Each example is assigned a scoring of how isolated or how likely it is to be outliers based on the size of its local neighborhood. Those examples with the largest score are more likely to be outliers. Read more in sklearn's documentation . One-class SVM The support vector machine algorithm developed initially for binary classification can be used for one-class classification. When modeling one class, the algorithm captures the density of the majority class and classifies examples on the extremes of the density function as outliers. This modification of SVM is referred to as One-Class SVM. Read more in sklearn's documentation . DBSCAN The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. Samples that lie outside any cluster are considered outliers. Read more in sklearn's documentation . OPTICS The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, and a spot within the cluster ordering. These two attributes are assigned when the model is fitted, and are used to determine cluster membership. Read more in sklearn's documentation . Balancing the data One of the common issues found in datasets that are used for classification is imbalanced classes. Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of the transactions are non-fraud, and a very few cases are fraud. This leaves us with a very unbalanced ratio of fraud vs non-fraud cases. The Balancer class can oversample the minority class or undersample the majority class using any of the transformers implemented in imblearn . It can be accessed from atom through the balance method. Feature engineering \"Applied machine learning\" is basically feature engineering. ~ Andrew Ng. Feature engineering is the process of creating new features from the existing ones, in order to capture relationships with the target column that the first set of features didn't had on their own. This process is very important to improve the performance of machine learning algorithms. Although feature engineering works best when the data scientist applies use-case specific transformations, there are ways to do this in an automated manner, without prior domain knowledge. One of the problems of creating new features without human expert intervention, is that many of the newly created features can be useless, i.e. they do not help the algorithm to make better predictions. Even worse, having useless features can drop your performance. To avoid this, we perform feature selection, a process in which we select the relevant features in the dataset. See the Feature engineering example. Note All of atom's feature engineering methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, n_jobs=4) . Note Like the add method, the feature engineering methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, columns=slice(5, 15)) . Generating new features The FeatureGenerator class creates new non-linear features based on the original feature set. It can be accessed from atom through the feature_generation method. You can choose between two strategies: Deep Feature Synthesis and Genetic Feature Generation. Deep Feature Synthesis Deep feature synthesis (DFS) applies the selected operators on the features in the dataset. For example, if the operator is \"log\", it will create the new feature LOG(old_feature) and if the operator is \"mul\", it will create the new feature old_feature_1 x old_feature_2 . The operators can be chosen through the operators parameter. Available options are: add: Sum two features together. sub: Subtract two features from each other. mul: Multiply two features with each other. div: Divide two features with each other. srqt: Take the square root of a feature. log: Take the logarithm of a feature. sin: Calculate the sine of a feature. cos: Calculate the cosine of a feature. tan: Calculate the tangent of a feature. ATOM's implementation of DFS uses the featuretools package. Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Genetic Feature Generation Genetic feature generation (GFG) uses genetic programming , a branch of evolutionary programming, to determine which features are successful and create new ones based on those. Where DFS can be seen as some kind of \"brute force\" for feature engineering, GFG tries to improve its features with every generation of the algorithm. GFG uses the same operators as DFS, but instead of only applying the transformations once, it evolves them further, creating complicated non-linear combinations of features with many transformations. The new features are given the name Feature N for the N-th feature. You can access the genetic feature's fitness and description (how they are calculated) through the genetic_features attribute. ATOM uses the SymbolicTransformer class from the gplearn package for the genetic algorithm. Read more about this implementation here . Warning GFG can be slow for very large populations! Selecting useful features The FeatureSelector class provides tooling to select the relevant features from a dataset. It can be accessed from atom through the feature_selection method. The following strategies are implemented: univariate, PCA, SFM, RFE and RFECV. Univariate Univariate feature selection works by selecting the best features based on univariate statistical F-test. The test is provided via the solver parameter. It takes any function taking two arrays (X, y), and returning arrays (scores, p-values). Read more in sklearn's documentation . Principal Components Analysis Applying PCA will reduce the dimensionality of the dataset by maximizing the variance of each dimension. The new features are called Component 1, Component 2, etc... The data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Read more in sklearn's documentation . Selection from model SFM uses an estimator with feature_importances_ or coef_ attributes to select the best features in a dataset based on importance weights. The estimator is provided through the solver parameter and can be already fitted. ATOM allows you to use one its predefined models , e.g. solver=\"RF\" . If you didn't call the FeatureSelector through atom, don't forget to indicate the estimator's task adding _class or _reg after the name, e.g. RF_class to use a random forest classifier. Read more in sklearn's documentation . Recursive feature elimination Select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached. Note that, since RFE needs to fit the model again every iteration, this method can be fairly slow. RFECV applies the same algorithm as RFE but uses a cross-validated metric (under the scoring parameter, see RFECV ) to assess every step's performance. Also, where RFE returns the number of features selected by n_features , RFECV returns the number of features that achieved the optimal score on the specified metric. Note that this is not always equal to the amount specified by n_features . Read more in sklearn's documentation . Removing features with low variance Variance is the expectation of the squared deviation of a random variable from its mean. Features with low variance have many values repeated, which means the model will not learn much from them. FeatureSelector removes all features where the same value is repeated in at least max_frac_repeated fraction of the rows. The default option is to remove a feature if all values in it are the same. Read more in sklearn's documentation . Removing features with multi-collinearity Two features that are highly correlated are redundant, i.e. two will not contribute more to the model than only one of them. FeatureSelector will drop a feature that has a Pearson correlation coefficient larger than max_correlation with another feature. A correlation of 1 means the two columns are equal. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE and RFECV strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs. Models Predefined models ATOM provides 31 models for classification and regression tasks that can be used to fit the data in the pipeline. After fitting, every model class is attached to the trainer as an attribute. We refer to these \"subclasses\" as models (see the nomenclature ). The classes contain a variety of attributes and methods to help you understand how the underlying estimator performed. They can be accessed using their acronyms, e.g. atom.LGB to access the LightGBM's model. The available models and their corresponding acronyms are: \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Classification/Regression \"Lasso\" for Lasso Regression \"EN\" for Elastic Net \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost \"LGB\" for LightGBM \"CatB\" for CatBoost \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron Tip The acronyms are case insensitive. You can also use lowercase to call the models, e.g. atom.lgb . Warning The models should not be initialized by the user! Only use them through the trainers. Custom models It is also possible to use your own models in ATOM's pipeline. For example, imagine we want to use sklearn's Lars estimator (note that is not included in ATOM's predefined models ). There are two ways to achieve this: Using ATOMModel (recommended). With this approach you can pass the required model characteristics to the pipeline. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel model = ATOMModel(models=Lars, fullname=\"Lars Regression\", needs_scaling=True, type=\"linear\") atom = ATOMRegressor(X, y) atom.run(model) Using the estimator's class or an instance of the class. This approach will also call ATOMModel under the hood, but it will leave its parameters to their default values. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel atom = ATOMRegressor(X, y) atom.run(models=Lars) Additional things to take into account: Custom models are not restricted to sklearn estimators, but they should follow sklearn's API , i.e. have a fit and predict method. Parameter customization (for the initializer) is only possible for custom models which provide an estimator's class or an instance that has a set_params() method, i.e. it's a child class of BaseEstimator . Hyperparameter optimization for custom models is ignored unless appropriate dimensions are provided through bo_params . If the estimator has a n_jobs and/or random_state parameter that is left to its default value, it will automatically adopt the values from the trainer it's called from. Deep learning Deep learning models can be used through ATOM's custom models as long as they follow sklearn's API . For example, models implemented with the Keras package should use the sklearn wrappers KerasClassifier or kerasRegressor . Many deep learning models, for example in computer vision and natural language processing, use datasets with more than 2 dimensions, e.g. image data can have shape (n_samples, length, width, rgb). These data structures are not intended to store in a two-dimensional pandas dataframe. Since ATOM requires a dataframe as instance for the dataset, multidimensional data sets are stored in a single column called \"Features\" where every row contains one (multidimensional) sample. example. Note that, because of this, the data cleaning , feature engineering and some of the plotting methods are unavailable for deep learning datasets. See in this example how to use ATOM to train and validate a Convolutional Neural Network implemented with Keras. Training The training phase is where the models are fitted and evaluated. After this, the models are attached to the trainer and you can use the plotting and predicting methods. The pipeline applies the following steps iteratively for all models: The optimal hyperparameters are selected. The model is trained on the training set and evaluated on the test set. The bagging algorithm is applied. There are three approaches to run the training. Direct training: DirectClassifier DirectRegressor Training via successive halving : SuccessiveHalvingClassifier SuccessiveHavingRegressor Training via train sizing : TrainSizingClassifier TrainSizingRegressor The direct fashion repeats the aforementioned steps only once, while the other two approaches repeats them more than once. Every approach can be directly called from atom through the run , successive_halving and train_sizing methods respectively. Models are called through their acronyms , e.g. atom.run(models=\"RF\") will train a Random Forest . If you want to run the same model multiple times, add a tag after the acronym to differentiate them. atom.run(models=[\"RF1\", \"RF2\"], est_params={\"RF1\": {\"n_estimators\": 100}, \"RF2\": {\"n_estimators\": 200}}) For example, this pipeline will fit two Random Forest models, one with 100 and the other with 200 decision trees. The models can be accessed through atom.rf1 and atom.rf2 . Use tagged models to test how the same model performs when fitted with different parameters or on different data sets. See the Imbalanced datasets example. Additional things to take into account: Models that need feature scaling will do so automatically before training if they are not already scaled. If an exception is encountered while fitting an estimator, the pipeline will automatically jump to the next model. The errors are stored in the errors attribute. Note that in case a model is skipped, there will be no model subclass for that estimator. When showing the final results, a ! indicates the highest score and a ~ indicates that the model is possibly overfitting (training set has a score at least 20% higher than the test set). The winning model (the one with the highest mean_bagging or metric_test ) can be accessed through the winner attribute. Metric ATOM uses sklearn's SCORERS for model selection and evaluation. A scorer consists of a metric function and some parameters that define the scorer's properties such as it's a score or loss function or if the function needs probability estimates or rounded predictions (see make_scorer ). ATOM lets you define the scorer for the pipeline in three ways: The metric parameter is one of sklearn's predefined scorers (as string). The metric parameter is a score (or loss) function with signature metric(y, y_pred, **kwargs). In this case, use the greater_is_better , needs_proba and needs_threshold parameters to specify the scorer's properties. The metric parameter is a scorer object. Note that all scorers follow the convention that higher return values are better than lower return values. Thus, metrics which measure the distance between the model and the data (i.e. loss functions), like max_error or mean_squared_error , will return the negated value of the metric. Custom scorer acronyms Since some of sklearn's scorers have quite long names and ATOM is all about lazy fast experimentation, the package provides acronyms for some of the most commonly used ones. These acronyms are case-insensitive and can be used in the metric parameter instead of the scorer's full name, e.g. atom.run(\"LR\", metric=\"BA\") will use balanced_accuracy . The available acronyms are: \"AP\" for \"average_precision\" \"BA\" for \"balanced_accuracy\" \"AUC\" for \"roc_auc\" \"LogLoss\" for \"neg_log_loss\" \"EV\" for \"explained_variance\" \"ME\" for \"max_error\" \"MAE\" for \"neg_mean_absolute_error\" \"MSE\" for \"neg_mean_squared_error\" \"RMSE\" for \"neg_root_mean_squared_error\" \"MSLE\" for \"neg_mean_squared_log_error\" \"MEDAE\" for \"neg_median_absolute_error\" \"POISSON\" for \"neg_mean_poisson_deviance\" \"GAMMA\" for \"neg_mean_gamma_deviance\" Multi-metric runs Sometimes it is useful to measure the performance of the models in more than one way. ATOM lets you run the pipeline with multiple metrics at the same time. To do so, provide the metric parameter with a list of desired metrics, e.g. atom.run(\"LDA\", metric=[\"r2\", \"mse\"]) . If you provide metric functions, don't forget to also provide lists to the greater_is_better , needs_proba and needs_threshold parameters, where the n-th value in the list corresponds to the n-th function. If you leave them as a single value, that value will apply to every provided metric. When fitting multi-metric runs, the resulting scores will return a list of metrics. For example, if you provided three metrics to the pipeline, atom.knn.metric_bo could return [0.8734, 0.6672, 0.9001]. It is also important to note that only the first metric of a multi-metric run is used to evaluate every step of the bayesian optimization and to select the winning model. Tip Some plots let you choose which of the metrics to show using the metric parameter. Parameter customization By default, the parameters every estimator uses are the same default parameters they get from their respective packages. To select different ones, use est_params . There are two ways to add custom parameters to the models: adding them directly to the dictionary as key-value pairs or through multiple dicts with the model names as keys. Adding the parameters directly to est_params will share them across all models in the pipeline. In this example, both the XGBoost and the LightGBM model will use n_estimators=200. Make sure all the models do have the specified parameters or an exception will be raised! atom.run([\"XGB\", \"LGB\"], est_params={\"n_estimators\": 200}) To specify parameters per model, use the model name as key and a dict of the parameters as value. In this example, the XGBoost model will use n_estimators=200 and the Multi-layer Perceptron will use one hidden layer with 75 neurons. atom.run([\"XGB\", \"MLP\"], est_params={\"XGB\": {\"n_estimators\": 200}, \"MLP\": {\"hidden_layer_sizes\": (75,)}}) Some estimators allow you to pass extra parameters to the fit method (besides X and y). This can be done adding _fit at the end of the parameter. For example, to change XGBoost's verbosity, we can run: atom.run(\"XGB\", est_params={\"verbose_fit\": True}) Note If a parameter is specified through est_params , it is ignored by the bayesian optimization! Hyperparameter optimization In order to achieve maximum performance, we need to tune an estimator's hyperparameters before training it. ATOM provides hyperparameter tuning using a bayesian optimization (BO) approach implemented by skopt . The BO is optimized on the first metric provided with the metric parameter. Each step is either computed by cross-validation on the complete training set or by randomly splitting the training set every iteration into a (sub) training set and a validation set. This process can create some data leakage but ensures maximal use of the provided data. The test set, however, does not contain any leakage and is used to determine the final score of every model. Note that, if the dataset is relatively small, the BO's best score can consistently be lower than the final score on the test set (despite the leakage) due to the considerable fewer instances on which it is trained. There are many possibilities to tune the BO to your liking. Use n_calls and n_initial_points to determine the number of iterations that are performed randomly at the start (exploration) and the number of iterations spent optimizing (exploitation). If n_calls is equal to n_initial_points , every iteration of the BO will select its hyperparameters randomly. This means the algorithm is technically performing a random search . Note The n_calls parameter includes the iterations in n_initial_points , i.e. calling atom.run(\"LR\", n_calls=20, n_intial_points=10) will run 20 iterations of which the first 10 are random. Note If n_initial_points=1 , the first trial is equal to the estimator's default parameters. Other settings can be changed through the bo_params parameter, a dictionary where every key-value combination can be used to further customize the BO. By default, the hyperparameters and corresponding dimensions per model are predefined by ATOM. Use the dimensions key to use custom ones. Just like with est_params , you can share the same dimensions across models or use a dictionary with the model names as keys to specify the dimensions for every individual model. Note that the provided search space dimensions must be compliant with skopt's API. atom.run(\"LR\", n_calls=10, bo_params={\"dimensions\": [Integer(100, 1000, name=\"max_iter\")]}) The majority of skopt's callbacks to stop the optimizer early can be accessed through bo_params . You can include other callbacks using the callbacks key. atom.run(\"LR\", n_calls=10, bo_params={\"max_time\": 1000, \"callbacks\": custom_callback()}) You can also include other parameters for the optimizer as key-value pairs. atom.run(\"LR\", n_calls=10, bo_params={\"acq_func\": \"EI\"}) Bagging After fitting the estimator, you can asses the robustness of the model using bootstrap aggregating (bagging). This technique creates several new data sets selecting random samples from the training set (with replacement) and evaluates them on the test set. This way we get a distribution of the performance of the model. The number of sets can be chosen through the bagging parameter. Tip Use the plot_results method to plot the bagging scores in a boxplot. Early stopping XGBoost , LightGBM and CatBoost allow in-training evaluation. This means that the estimator is evaluated after every round of the training, and that the training is stopped early if it didn't improve in the last early_stopping rounds. This can save the pipeline much time that would otherwise be wasted on an estimator that is unlikely to improve further. Note that this technique is applied both during the BO and at the final fit on the complete training set. There are two ways to apply early stopping on these models: Through the early_stopping key in bo_params . This approach applies early stopping to all models in the pipeline and allows the input of a fraction of the total number of rounds. Filling the early_stopping_rounds parameter directly in est_params . Don't forget to add _fit to the parameter to call it from the fit method. After fitting, the model will get the evals attribute, a dictionary of the train and test performances per round (also if early stopping wasn't applied). Click here for an example using early stopping. Tip Use the plot_evals method to plot the in-training evaluation on the train and test set. Successive halving Successive halving is a bandit-based algorithm that fits N models to 1/N of the data. The best half are selected to go to the next iteration where the process is repeated. This continues until only one model remains, which is fitted on the complete dataset. Beware that a model's performance can depend greatly on the amount of data on which it is trained. For this reason, we recommend only to use this technique with similar models, e.g. only using tree-based models. Use successive halving through the SuccessiveHalvingClassifier / SuccessiveHalvingRegressor classes or from atom via the successive_halving method. Consecutive runs of the same model are saved with the model's acronym followed by the number of models in the run. For example, a Random Forest in a run with 4 models would become model RF4 . Click here for a successive halving example. Tip Use the plot_successive_halving method to see every model's performance per iteration of the successive halving. Train sizing When training models, there is usually a trade-off between model performance and computation time that is regulated by the number of samples in the training set. Train sizing can be used to create insights in this trade-off and help determine the optimal size of the training set, fitting the models multiple times, ever increasing the number of samples in the training set. Use train sizing through the TrainSizingClassifier / TrainSizingRegressor classes or from atom via the train_sizing method. The number of iterations and the number of samples per training can be specified with the train_sizes parameter. Consecutive runs of the same model are saved with the model's acronym followed by the fraction of rows in the training set (the . is removed from the fraction!). For example, a Random Forest in a run with 80% of the training samples would become model RF08 . Click here for a train sizing example. Tip Use the plot_learning_curve method to see the model's performance per size of the training set. Voting The idea behind Voting is to combine the predictions of conceptually different models to make new predictions. Such a technique can be useful for a set of equally well performing models in order to balance out their individual weaknesses. Read more in sklearn's documentation . A Voting model is created from a trainer through the voting method. The Voting model is added automatically to the list of models in the pipeline, under the Vote acronym. Although similar, this model is different from the VotingClassifier and VotingRegressor estimators from sklearn. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. Plots that require an estimator object will raise an exception. The Voting class has the same prediction attributes and prediction methods as other models. The predict_proba , predict_log_proba , decision_function and score methods return the average predictions (soft voting) over the models in the instance. Note that these methods will raise an exception if not all estimators in the Voting instance have the specified method. The predict method returns the majority vote (hard voting). The scoring method also returns the average scoring for the selected metric over the models. Click here for a voting example. Warning Although it is possible to include models from different branches in the same Voting instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods. Stacking Stacking is a method for combining estimators to reduce their biases. More precisely, the predictions of each individual estimator are stacked together and used as input to a final estimator to compute the prediction. Read more in sklearn's documentation . A Stacking model is created from a trainer through the stacking method. The Stacking model is added automatically to the list of models in the pipeline, under the Stack acronym. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. The prediction methods, the scoring method and the plot methods that require an estimator object will use the Voting's final estimator, under the estimator attribute. Click here for a stacking example. Warning Although it is possible to include models from different branches in the same Stacking instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods. Predicting After running a successful pipeline, it is possible you would like to apply all used transformations onto new data, or make predictions using one of the trained models. Just like a sklearn estimator, you can call the prediction methods from a fitted trainer, e.g. atom.predict(X) . Calling the method without specifying a model will use the winning model in the pipeline (under attribute winner ). To use a different model, simply call the method from a model, e.g. atom.KNN.predict(X) . All prediction methods transform the provided data through the data cleaning and feature engineering transformers before making the predictions. By default, this excludes outlier handling and balancing the dataset since these steps should only be applied on the training set. Use the method's kwargs to select which transformations to use in every call. The available prediction methods are a selection of the most common methods for estimators in sklearn's API: transform Transform new data through all transformers in a branch. predict Transform new data through all transformers in a branch and return class predictions. predict_proba Transform new data through all transformers in a branch and return class predictions. predict_log_proba Transform new data through all transformers in a branch and return class log-probabilities. decision_function Transform new data through all transformers in a branch and return predicted confidence scores. score Transform new data through all transformers in a branch and return the model's score. Except for transform, the prediction methods can be calculated on the train and test set. You can access them through the model's prediction attributes, e.g. atom.mnb.predict_train or atom.mnb.predict_test . Keep in mind that the results are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Note Many of the plots use the prediction attributes. This can considerably increase the size of the class for large datasets. Use the reset_predictions method if you need to free some memory! Plots After fitting the models to the data, it's time to analyze the results. ATOM provides many plotting methods to compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib , seaborn and shap for plotting. The plot methods can be called from a training directly, e.g. atom.plot_roc() , or from one of the models, e.g. atom.LGB.plot_roc() . If called from training , it will make the plot for all models in the pipeline. This can be useful to compare the results of multiple models. If called from a model, it will make the plot for only that model. Use this option if you want information just for that specific model or to make a plot less crowded. Parameters Apart from the plot-specific parameters they may have, all plots have four parameters in common: The title parameter allows you to add a custom title to the plot. The figsize parameter adjust the plot's size. The filename parameter is used to save the plot. The display parameter determines whether the plot is rendered. Aesthetics The plot aesthetics can be customized using the plot attributes, e.g. atom.style = \"white\" . These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. ATOM's default values are: style: \"darkgrid\" palette: \"GnBu_r_d\" title_fontsize: 20 label_fontsize: 16 tick_fontsize: 12 Use the reset_aesthetics method to reset all the aesthetics to their default value. Canvas Sometimes it might be desirable to draw multiple plots side by side in order to be able to compare them easier. Use the atom's canvas method for this. The canvas method is a @contextmanager , i.e. it is used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot could become too messy. atom = ATOMClassifier(X, y) atom.run([\"xgb\", \"lgb\"], n_calls=0) with atom.canvas(2, 2, title=\"XGBoost vs LightGBM\", filename=\"canvas\"): atom.xgb.plot_roc(dataset=\"both\", title=\"ROC - XGBoost\") atom.lgb.plot_roc(dataset=\"both\", title=\"ROC - LightGBM\") atom.xgb.plot_prc(dataset=\"both\", title=\"PRC - XGBoost\") atom.lgb.plot_prc(dataset=\"both\", title=\"PRC - LightGBM\") SHAP The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot , beeswarm_plot , decision_plot , force_plot , heatmap_plot , scatter_plot and waterfall_plot . Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot() . Info You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot . Available plots A list of available plots can be find hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand. plot_correlation Plot the data's correlation matrix. plot_scatter_matrix Plot the data's scatter matrix. plot_qq Plot a quantile-quantile plot. plot_distribution Plot column distributions. plot_pipeline Plot a diagram of every estimator in atom's pipeline. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per components. plot_rfecv Plot the RFECV results. plot_successive_halving Plot of the models\" scores per iteration of the successive halving. plot_learning_curve Plot the model's learning curve. plot_results Plot a boxplot of the bagging's results. plot_bo Plot the bayesian optimization scoring. plot_evals Plot evaluation curves for the train and test set. plot_roc Plot the Receiver Operating Characteristics curve. plot_prc Plot the precision-recall curve. plot_permutation_importance Plot the feature permutation importance of models. plot_feature_importance Plot a tree-based model's feature importance. plot_partial_dependence Plot the partial dependence of features. plot_errors Plot a model's prediction errors. plot_residuals Plot a model's residuals. plot_confusion_matrix Plot a model's confusion matrix. plot_threshold Plot metric performances against threshold values. plot_probabilities Plot the probability distribution of the classes in the target column. plot_calibration Plot the calibration curve for a binary classifier. plot_gains Plot the cumulative gains curve. plot_lift Plot the lift curve. bar_plot Plot SHAP's bar plot. beeswarm_plot Plot SHAP's beeswarm plot. decision_plot Plot SHAP's decision plot. force_plot Plot SHAP's force plot. heatmap_plot Plot SHAP's heatmap plot. scatter_plot Plot SHAP's scatter plot. waterfall_plot Plot SHAP's waterfall plot.","title":"User guide"},{"location":"user_guide/#introduction","text":"There is no magic formula in data science that can tell us which type of machine learning estimator in combination with which pipeline will perform best for a given raw dataset. Different models are better suited for different types of data and different types of problems. At best, you can follow some rough guide on how to approach problems with regard to which model to try on your data, but these are incomplete at best. During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? Although best practices tell us to start with a simple model and build up to more complicated ones, many data scientists just use the model best known to them in order to avoid the aforementioned problems. This can result in poor performance (because the model is just not the right one for the task) or in inefficient management of time and computing resources (because a simpler/faster model could have achieved a similar performance). ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. It is important to realize that ATOM is not here to replace all the work a data scientist has to do before getting his model into production. ATOM doesn't spit out production-ready models just by tuning some parameters in its API. After helping you determine the right pipeline, you will most probably need to fine-tune it using use-case specific features and data cleaning steps in order to achieve maximum performance.","title":"Introduction"},{"location":"user_guide/#nomenclature","text":"In this documentation we will consistently use terms to refer to certain concepts related to this package. atom : Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name). ATOM : Refers to this package. branch : Collection of estimators in the pipeline fitted to a specific dataset. See the branches section. BO : Bayesian optimization algorithm used for hyperparameter optimization. categorical columns : Refers to all non-numerical columns. class : Unique value in a column, e.g. a binary classifier has 2 classes in the target column. estimator : An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. missing values : Values in the missing attribute. model : Instance of a model in the pipeline. outlier : Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy. outlier value : Value that lies further than 3 times the standard_deviation away from the mean of its column (|z-score| > 3). pipeline : All the content in atom for a specific branch. predictor : An estimator implementing a predict method. This encompasses all classifiers and regressors. scorer : A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation . sequence : A one-dimensional array of variable type list , tuple , np.ndarray or pd.Series . target : Name of the dependent variable, passed as y to an estimator's fit method. task : One of the three supervised machine learning approaches that ATOM supports: binary classification multiclass classification regression trainer : Instance of a class that train and evaluate the models (implement a run method). The following classes are considered trainers: ATOMClassifier ATOMRegressor DirectClassifier DirectRegressor SuccessiveHalvingClassifier SuccessiveHavingRegressor TrainSizingClassifier TrainSizingRegressor transformer : An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes.","title":"Nomenclature"},{"location":"user_guide/#first-steps","text":"You can quickly install atom using pip or conda , see the installation guide . ATOM contains a variety of classes to perform data cleaning, feature engineering, model training, plotting and much more. The easiest way to use everything ATOM has to offer is through one of the main classes: ATOMClassifier for binary or multiclass classification tasks. ATOMRegressor for regression tasks. These two classes are convenient wrappers for the whole machine learning pipeline. Like a sklearn Pipeline , they assemble several steps that can be cross-validated together while setting different parameters. There are some important differences with sklearn's API: atom is initialized with the data you want to manipulate. This data can be accessed at any moment through atom's data attributes . The classes in ATOM's API are reached through atom's methods. For example, calling the encode method will initialize an Encoder instance, fit it on the training set and transform the whole dataset. The transformations are applied immediately after calling the method (there is no fit method). This approach gives the user a clearer overview and more control over every step in the pipeline. Let's get started with an example! First, initialize atom and provide it the data you want to use. You can either input a dataset and let ATOM split the train and test set or provide a train and test set already split. Note that if a dataframe is provided, the indices are reset by atom. atom = ATOMClassifier(X, y, test_size=0.25) Apply data cleaning methods through the class. For example, calling the impute method will handle all missing values in the dataset. atom.impute(strat_num=\"median\", strat_cat=\"most_frequent\", min_frac_rows=0.1) Select the best hyperparameters and fit a Random Forest and AdaBoost model. atom.run([\"RF\", \"AdaB\"], metric=\"accuracy\", n_calls=25, n_initial_points=10) Analyze the results: atom.feature_importances(show=10, filename=\"feature_importance_plot\") atom.plot_prc(title=\"Precision-recall curve comparison plot\")","title":"First steps"},{"location":"user_guide/#data-pipelines","text":"It may happen that you want to compare how a model performs on different datasets. For example, on one dataset balanced with an undersampling strategy and the other with an oversampling strategy. For this, atom has data pipelines.","title":"Data pipelines"},{"location":"user_guide/#branches","text":"Data pipelines manage separate paths atom's dataset can take. The paths are called branches and can be accessed through the branch attribute. Calling it will show the branches in the pipeline. The current branch is indicated with ! . A branch contains a specific dataset, and the transformers it took to arrive to that dataset from the one atom initialized with. Accessing data attributes such as atom.dataset will return the data in the current branch. Use the pipeline attribute to see the estimators in the branch. All data cleaning, feature engineering and trainers called will use the dataset in the current branch. This means that models are trained and validated on the data in that branch. Don't change the data in a branch after fitting a model, this can cause unexpected model behaviour. Instead, create a new branch for every unique model pipeline. By default, atom starts with one branch called \"master\". To start a new branch, set a new name to the property, e.g. atom.branch = \"new_branch\" . This will start a new branch from the current one. To create a branch from any other branch type \"_from_\" between the new name and the branch from which to split, e.g. atom.branch = \"branch2_from_branch1\" will create branch \"branch2\" from branch \"branch1\". To switch between existing branches, just type the name of the desired branch, e.g. atom.branch = \"master\" to go back to the main branch. Note that every branch contains a unique copy of the whole dataset! Creating many branches can cause memory issues for large datasets. You can delete a branch either deleting the attribute, e.g. del atom.branch , or using the delete method, e.g. atom.branch.delete() . A branch can only be deleted if no models were trained on its dataset. Use atom.branch.status() to print a list of the transformers and models in the branch. See the Imbalanced datasets or Feature engineering examples for branching use cases. Warning Always create a new branch if you want to change the dataset after fitting a model! Not doing so can cause unexpected model behaviour.","title":"Branches"},{"location":"user_guide/#data-transformations","text":"Performing data transformations is a common requirement of many datasets before they are ready to be ingested by a model. ATOM provides various classes to apply data cleaning and feature engineering transformations to the data. This tooling should be able to help you apply most of the typically needed transformations to get the data ready for modelling. For further fine-tuning, it is also possible to pre-process the data using custom transformers. They can be added to the pipeline using atom's add method. Remember that all transformations are only applied to the dataset in the current branch.","title":"Data transformations"},{"location":"user_guide/#automl","text":"Automated machine learning (AutoML) automates the selection, composition and parameterization of machine learning pipelines. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms. ATOM uses the TPOT package for AutoML optimization. TPOT uses a genetic algorithm to intelligently explore thousands of possible pipelines in order to find the best one for your data. Such an algorithm can be started through the automl method. The resulting data transformers and final estimator are merged with atom's pipeline (check the pipeline and models attributes after the method finishes running). Warning AutoML algorithms aren't intended to run for only a few minutes. If left to its default parameters, the method can take a very long time to finish!","title":"AutoML"},{"location":"user_guide/#data-cleaning","text":"More often than not, you need to do some data cleaning before fitting your dataset to a model. Usually, this involves importing different libraries and writing many lines of code. Since ATOM is all about fast exploration and experimentation, it provides various data cleaning classes to apply the most common transformations fast and easy. Note All of atom's data cleaning methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.scale(verbose=2) . Note Like the add method, the data cleaning methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.scale(columns=[0, 1]) .","title":"Data cleaning"},{"location":"user_guide/#scaling-the-feature-set","text":"Standardization of a dataset is a common requirement for many machine learning estimators; they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with zero mean and unit variance). The Scaler class let you quickly scale atom's dataset using one of sklearn's scalers. It can be accessed from atom through the scale method.","title":"Scaling the feature set"},{"location":"user_guide/#standard-data-cleaning","text":"There are many data cleaning steps that are useful to perform on any dataset before modelling. These are general rules that apply almost on every use-case and every task. The Cleaner class is a convenient tool to apply such steps. It can be accessed from atom through the clean method. Use the class' parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column.","title":"Standard data cleaning"},{"location":"user_guide/#imputing-missing-values","text":"For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. Such datasets however are incompatible with ATOM's models which assume that all values in an array are numerical, and that all have and hold meaning. The Imputer class handles missing values in the dataset by either dropping or imputing the value. It can be accessed from atom through the impute method. Tip Use atom's missing attribute to check the amount of missing values per feature.","title":"Imputing missing values"},{"location":"user_guide/#encoding-categorical-features","text":"Many datasets will contain categorical features. Their variables are typically stored as text values which represent various traits. Some examples include color (\u201cRed\u201d, \u201cYellow\u201d, \u201cBlue\u201d), size (\u201cSmall\u201d, \u201cMedium\u201d, \u201cLarge\u201d) or geographic designations (city or country). Regardless of what the value is used for, the challenge is determining how to use this data in the analysis. ATOM's models don't support direct manipulation of this kind of data. Use the Encoder class to encode categorical features to numerical values. It can be accessed from atom through the encode method. Tip Use atom's categorical attribute for a list of the categorical features in the dataset.","title":"Encoding categorical features"},{"location":"user_guide/#handling-outliers","text":"When modeling, it is important to clean the data sample to ensure that the observations best represent the problem. Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers. Often, machine learning modeling and model skill in general can be improved by understanding and even removing these outlier samples. The Pruner class offers 5 different strategies to detect outliers (described hereunder). It can be accessed from atom through the prune method. Tip Use atom's outliers attribute to check the number of outliers per column. z-score The z-score of a value in the dataset is defined as the number of standard deviations by which the value is above or below the mean of the column. Values above or below a certain threshold (specified with the parameter max_sigma ) are considered outliers. Note that, contrary to the rest of the strategies, this approach selects outlier values, not outlier samples! Because of this, it is possible to replace the outlier value instead of simply dropping the sample. Isolation Forest Uses a tree-based anomaly detection algorithm. It is based on modeling the normal data in such a way as to isolate anomalies that are both few and different in the feature space. Read more in sklearn's documentation . Elliptic Envelope If the input variables have a Gaussian distribution, then simple statistical methods can be used to detect outliers. For example, if the dataset has two input variables and both are Gaussian, then the feature space forms a multi-dimensional Gaussian and knowledge of this distribution can be used to identify values far from the distribution. This approach can be generalized by defining a hypersphere (ellipsoid) that covers the normal data, and data that falls outside this shape is considered an outlier. Read more in sklearn's documentation . Local Outlier Factor A simple approach to identifying outliers is to locate those examples that are far from the other examples in the feature space. This can work well for feature spaces with low dimensionality (few features) but becomes less reliable as the number of features is increased. This is referred to as the curse of dimensionality. The local outlier factor is a technique that attempts to harness the idea of nearest neighbors for outlier detection. Each example is assigned a scoring of how isolated or how likely it is to be outliers based on the size of its local neighborhood. Those examples with the largest score are more likely to be outliers. Read more in sklearn's documentation . One-class SVM The support vector machine algorithm developed initially for binary classification can be used for one-class classification. When modeling one class, the algorithm captures the density of the majority class and classifies examples on the extremes of the density function as outliers. This modification of SVM is referred to as One-Class SVM. Read more in sklearn's documentation . DBSCAN The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. Samples that lie outside any cluster are considered outliers. Read more in sklearn's documentation . OPTICS The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, and a spot within the cluster ordering. These two attributes are assigned when the model is fitted, and are used to determine cluster membership. Read more in sklearn's documentation .","title":"Handling outliers"},{"location":"user_guide/#balancing-the-data","text":"One of the common issues found in datasets that are used for classification is imbalanced classes. Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of the transactions are non-fraud, and a very few cases are fraud. This leaves us with a very unbalanced ratio of fraud vs non-fraud cases. The Balancer class can oversample the minority class or undersample the majority class using any of the transformers implemented in imblearn . It can be accessed from atom through the balance method.","title":"Balancing the data"},{"location":"user_guide/#feature-engineering","text":"\"Applied machine learning\" is basically feature engineering. ~ Andrew Ng. Feature engineering is the process of creating new features from the existing ones, in order to capture relationships with the target column that the first set of features didn't had on their own. This process is very important to improve the performance of machine learning algorithms. Although feature engineering works best when the data scientist applies use-case specific transformations, there are ways to do this in an automated manner, without prior domain knowledge. One of the problems of creating new features without human expert intervention, is that many of the newly created features can be useless, i.e. they do not help the algorithm to make better predictions. Even worse, having useless features can drop your performance. To avoid this, we perform feature selection, a process in which we select the relevant features in the dataset. See the Feature engineering example. Note All of atom's feature engineering methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, n_jobs=4) . Note Like the add method, the feature engineering methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, columns=slice(5, 15)) .","title":"Feature engineering"},{"location":"user_guide/#generating-new-features","text":"The FeatureGenerator class creates new non-linear features based on the original feature set. It can be accessed from atom through the feature_generation method. You can choose between two strategies: Deep Feature Synthesis and Genetic Feature Generation. Deep Feature Synthesis Deep feature synthesis (DFS) applies the selected operators on the features in the dataset. For example, if the operator is \"log\", it will create the new feature LOG(old_feature) and if the operator is \"mul\", it will create the new feature old_feature_1 x old_feature_2 . The operators can be chosen through the operators parameter. Available options are: add: Sum two features together. sub: Subtract two features from each other. mul: Multiply two features with each other. div: Divide two features with each other. srqt: Take the square root of a feature. log: Take the logarithm of a feature. sin: Calculate the sine of a feature. cos: Calculate the cosine of a feature. tan: Calculate the tangent of a feature. ATOM's implementation of DFS uses the featuretools package. Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Genetic Feature Generation Genetic feature generation (GFG) uses genetic programming , a branch of evolutionary programming, to determine which features are successful and create new ones based on those. Where DFS can be seen as some kind of \"brute force\" for feature engineering, GFG tries to improve its features with every generation of the algorithm. GFG uses the same operators as DFS, but instead of only applying the transformations once, it evolves them further, creating complicated non-linear combinations of features with many transformations. The new features are given the name Feature N for the N-th feature. You can access the genetic feature's fitness and description (how they are calculated) through the genetic_features attribute. ATOM uses the SymbolicTransformer class from the gplearn package for the genetic algorithm. Read more about this implementation here . Warning GFG can be slow for very large populations!","title":"Generating new features"},{"location":"user_guide/#selecting-useful-features","text":"The FeatureSelector class provides tooling to select the relevant features from a dataset. It can be accessed from atom through the feature_selection method. The following strategies are implemented: univariate, PCA, SFM, RFE and RFECV. Univariate Univariate feature selection works by selecting the best features based on univariate statistical F-test. The test is provided via the solver parameter. It takes any function taking two arrays (X, y), and returning arrays (scores, p-values). Read more in sklearn's documentation . Principal Components Analysis Applying PCA will reduce the dimensionality of the dataset by maximizing the variance of each dimension. The new features are called Component 1, Component 2, etc... The data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Read more in sklearn's documentation . Selection from model SFM uses an estimator with feature_importances_ or coef_ attributes to select the best features in a dataset based on importance weights. The estimator is provided through the solver parameter and can be already fitted. ATOM allows you to use one its predefined models , e.g. solver=\"RF\" . If you didn't call the FeatureSelector through atom, don't forget to indicate the estimator's task adding _class or _reg after the name, e.g. RF_class to use a random forest classifier. Read more in sklearn's documentation . Recursive feature elimination Select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached. Note that, since RFE needs to fit the model again every iteration, this method can be fairly slow. RFECV applies the same algorithm as RFE but uses a cross-validated metric (under the scoring parameter, see RFECV ) to assess every step's performance. Also, where RFE returns the number of features selected by n_features , RFECV returns the number of features that achieved the optimal score on the specified metric. Note that this is not always equal to the amount specified by n_features . Read more in sklearn's documentation . Removing features with low variance Variance is the expectation of the squared deviation of a random variable from its mean. Features with low variance have many values repeated, which means the model will not learn much from them. FeatureSelector removes all features where the same value is repeated in at least max_frac_repeated fraction of the rows. The default option is to remove a feature if all values in it are the same. Read more in sklearn's documentation . Removing features with multi-collinearity Two features that are highly correlated are redundant, i.e. two will not contribute more to the model than only one of them. FeatureSelector will drop a feature that has a Pearson correlation coefficient larger than max_correlation with another feature. A correlation of 1 means the two columns are equal. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE and RFECV strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs.","title":"Selecting useful features"},{"location":"user_guide/#models","text":"","title":"Models"},{"location":"user_guide/#predefined-models","text":"ATOM provides 31 models for classification and regression tasks that can be used to fit the data in the pipeline. After fitting, every model class is attached to the trainer as an attribute. We refer to these \"subclasses\" as models (see the nomenclature ). The classes contain a variety of attributes and methods to help you understand how the underlying estimator performed. They can be accessed using their acronyms, e.g. atom.LGB to access the LightGBM's model. The available models and their corresponding acronyms are: \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Classification/Regression \"Lasso\" for Lasso Regression \"EN\" for Elastic Net \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost \"LGB\" for LightGBM \"CatB\" for CatBoost \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron Tip The acronyms are case insensitive. You can also use lowercase to call the models, e.g. atom.lgb . Warning The models should not be initialized by the user! Only use them through the trainers.","title":"Predefined models"},{"location":"user_guide/#custom-models","text":"It is also possible to use your own models in ATOM's pipeline. For example, imagine we want to use sklearn's Lars estimator (note that is not included in ATOM's predefined models ). There are two ways to achieve this: Using ATOMModel (recommended). With this approach you can pass the required model characteristics to the pipeline. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel model = ATOMModel(models=Lars, fullname=\"Lars Regression\", needs_scaling=True, type=\"linear\") atom = ATOMRegressor(X, y) atom.run(model) Using the estimator's class or an instance of the class. This approach will also call ATOMModel under the hood, but it will leave its parameters to their default values. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel atom = ATOMRegressor(X, y) atom.run(models=Lars) Additional things to take into account: Custom models are not restricted to sklearn estimators, but they should follow sklearn's API , i.e. have a fit and predict method. Parameter customization (for the initializer) is only possible for custom models which provide an estimator's class or an instance that has a set_params() method, i.e. it's a child class of BaseEstimator . Hyperparameter optimization for custom models is ignored unless appropriate dimensions are provided through bo_params . If the estimator has a n_jobs and/or random_state parameter that is left to its default value, it will automatically adopt the values from the trainer it's called from.","title":"Custom models"},{"location":"user_guide/#deep-learning","text":"Deep learning models can be used through ATOM's custom models as long as they follow sklearn's API . For example, models implemented with the Keras package should use the sklearn wrappers KerasClassifier or kerasRegressor . Many deep learning models, for example in computer vision and natural language processing, use datasets with more than 2 dimensions, e.g. image data can have shape (n_samples, length, width, rgb). These data structures are not intended to store in a two-dimensional pandas dataframe. Since ATOM requires a dataframe as instance for the dataset, multidimensional data sets are stored in a single column called \"Features\" where every row contains one (multidimensional) sample. example. Note that, because of this, the data cleaning , feature engineering and some of the plotting methods are unavailable for deep learning datasets. See in this example how to use ATOM to train and validate a Convolutional Neural Network implemented with Keras.","title":"Deep learning"},{"location":"user_guide/#training","text":"The training phase is where the models are fitted and evaluated. After this, the models are attached to the trainer and you can use the plotting and predicting methods. The pipeline applies the following steps iteratively for all models: The optimal hyperparameters are selected. The model is trained on the training set and evaluated on the test set. The bagging algorithm is applied. There are three approaches to run the training. Direct training: DirectClassifier DirectRegressor Training via successive halving : SuccessiveHalvingClassifier SuccessiveHavingRegressor Training via train sizing : TrainSizingClassifier TrainSizingRegressor The direct fashion repeats the aforementioned steps only once, while the other two approaches repeats them more than once. Every approach can be directly called from atom through the run , successive_halving and train_sizing methods respectively. Models are called through their acronyms , e.g. atom.run(models=\"RF\") will train a Random Forest . If you want to run the same model multiple times, add a tag after the acronym to differentiate them. atom.run(models=[\"RF1\", \"RF2\"], est_params={\"RF1\": {\"n_estimators\": 100}, \"RF2\": {\"n_estimators\": 200}}) For example, this pipeline will fit two Random Forest models, one with 100 and the other with 200 decision trees. The models can be accessed through atom.rf1 and atom.rf2 . Use tagged models to test how the same model performs when fitted with different parameters or on different data sets. See the Imbalanced datasets example. Additional things to take into account: Models that need feature scaling will do so automatically before training if they are not already scaled. If an exception is encountered while fitting an estimator, the pipeline will automatically jump to the next model. The errors are stored in the errors attribute. Note that in case a model is skipped, there will be no model subclass for that estimator. When showing the final results, a ! indicates the highest score and a ~ indicates that the model is possibly overfitting (training set has a score at least 20% higher than the test set). The winning model (the one with the highest mean_bagging or metric_test ) can be accessed through the winner attribute.","title":"Training"},{"location":"user_guide/#metric","text":"ATOM uses sklearn's SCORERS for model selection and evaluation. A scorer consists of a metric function and some parameters that define the scorer's properties such as it's a score or loss function or if the function needs probability estimates or rounded predictions (see make_scorer ). ATOM lets you define the scorer for the pipeline in three ways: The metric parameter is one of sklearn's predefined scorers (as string). The metric parameter is a score (or loss) function with signature metric(y, y_pred, **kwargs). In this case, use the greater_is_better , needs_proba and needs_threshold parameters to specify the scorer's properties. The metric parameter is a scorer object. Note that all scorers follow the convention that higher return values are better than lower return values. Thus, metrics which measure the distance between the model and the data (i.e. loss functions), like max_error or mean_squared_error , will return the negated value of the metric. Custom scorer acronyms Since some of sklearn's scorers have quite long names and ATOM is all about lazy fast experimentation, the package provides acronyms for some of the most commonly used ones. These acronyms are case-insensitive and can be used in the metric parameter instead of the scorer's full name, e.g. atom.run(\"LR\", metric=\"BA\") will use balanced_accuracy . The available acronyms are: \"AP\" for \"average_precision\" \"BA\" for \"balanced_accuracy\" \"AUC\" for \"roc_auc\" \"LogLoss\" for \"neg_log_loss\" \"EV\" for \"explained_variance\" \"ME\" for \"max_error\" \"MAE\" for \"neg_mean_absolute_error\" \"MSE\" for \"neg_mean_squared_error\" \"RMSE\" for \"neg_root_mean_squared_error\" \"MSLE\" for \"neg_mean_squared_log_error\" \"MEDAE\" for \"neg_median_absolute_error\" \"POISSON\" for \"neg_mean_poisson_deviance\" \"GAMMA\" for \"neg_mean_gamma_deviance\" Multi-metric runs Sometimes it is useful to measure the performance of the models in more than one way. ATOM lets you run the pipeline with multiple metrics at the same time. To do so, provide the metric parameter with a list of desired metrics, e.g. atom.run(\"LDA\", metric=[\"r2\", \"mse\"]) . If you provide metric functions, don't forget to also provide lists to the greater_is_better , needs_proba and needs_threshold parameters, where the n-th value in the list corresponds to the n-th function. If you leave them as a single value, that value will apply to every provided metric. When fitting multi-metric runs, the resulting scores will return a list of metrics. For example, if you provided three metrics to the pipeline, atom.knn.metric_bo could return [0.8734, 0.6672, 0.9001]. It is also important to note that only the first metric of a multi-metric run is used to evaluate every step of the bayesian optimization and to select the winning model. Tip Some plots let you choose which of the metrics to show using the metric parameter.","title":"Metric"},{"location":"user_guide/#parameter-customization","text":"By default, the parameters every estimator uses are the same default parameters they get from their respective packages. To select different ones, use est_params . There are two ways to add custom parameters to the models: adding them directly to the dictionary as key-value pairs or through multiple dicts with the model names as keys. Adding the parameters directly to est_params will share them across all models in the pipeline. In this example, both the XGBoost and the LightGBM model will use n_estimators=200. Make sure all the models do have the specified parameters or an exception will be raised! atom.run([\"XGB\", \"LGB\"], est_params={\"n_estimators\": 200}) To specify parameters per model, use the model name as key and a dict of the parameters as value. In this example, the XGBoost model will use n_estimators=200 and the Multi-layer Perceptron will use one hidden layer with 75 neurons. atom.run([\"XGB\", \"MLP\"], est_params={\"XGB\": {\"n_estimators\": 200}, \"MLP\": {\"hidden_layer_sizes\": (75,)}}) Some estimators allow you to pass extra parameters to the fit method (besides X and y). This can be done adding _fit at the end of the parameter. For example, to change XGBoost's verbosity, we can run: atom.run(\"XGB\", est_params={\"verbose_fit\": True}) Note If a parameter is specified through est_params , it is ignored by the bayesian optimization!","title":"Parameter customization"},{"location":"user_guide/#hyperparameter-optimization","text":"In order to achieve maximum performance, we need to tune an estimator's hyperparameters before training it. ATOM provides hyperparameter tuning using a bayesian optimization (BO) approach implemented by skopt . The BO is optimized on the first metric provided with the metric parameter. Each step is either computed by cross-validation on the complete training set or by randomly splitting the training set every iteration into a (sub) training set and a validation set. This process can create some data leakage but ensures maximal use of the provided data. The test set, however, does not contain any leakage and is used to determine the final score of every model. Note that, if the dataset is relatively small, the BO's best score can consistently be lower than the final score on the test set (despite the leakage) due to the considerable fewer instances on which it is trained. There are many possibilities to tune the BO to your liking. Use n_calls and n_initial_points to determine the number of iterations that are performed randomly at the start (exploration) and the number of iterations spent optimizing (exploitation). If n_calls is equal to n_initial_points , every iteration of the BO will select its hyperparameters randomly. This means the algorithm is technically performing a random search . Note The n_calls parameter includes the iterations in n_initial_points , i.e. calling atom.run(\"LR\", n_calls=20, n_intial_points=10) will run 20 iterations of which the first 10 are random. Note If n_initial_points=1 , the first trial is equal to the estimator's default parameters. Other settings can be changed through the bo_params parameter, a dictionary where every key-value combination can be used to further customize the BO. By default, the hyperparameters and corresponding dimensions per model are predefined by ATOM. Use the dimensions key to use custom ones. Just like with est_params , you can share the same dimensions across models or use a dictionary with the model names as keys to specify the dimensions for every individual model. Note that the provided search space dimensions must be compliant with skopt's API. atom.run(\"LR\", n_calls=10, bo_params={\"dimensions\": [Integer(100, 1000, name=\"max_iter\")]}) The majority of skopt's callbacks to stop the optimizer early can be accessed through bo_params . You can include other callbacks using the callbacks key. atom.run(\"LR\", n_calls=10, bo_params={\"max_time\": 1000, \"callbacks\": custom_callback()}) You can also include other parameters for the optimizer as key-value pairs. atom.run(\"LR\", n_calls=10, bo_params={\"acq_func\": \"EI\"})","title":"Hyperparameter optimization"},{"location":"user_guide/#bagging","text":"After fitting the estimator, you can asses the robustness of the model using bootstrap aggregating (bagging). This technique creates several new data sets selecting random samples from the training set (with replacement) and evaluates them on the test set. This way we get a distribution of the performance of the model. The number of sets can be chosen through the bagging parameter. Tip Use the plot_results method to plot the bagging scores in a boxplot.","title":"Bagging"},{"location":"user_guide/#early-stopping","text":"XGBoost , LightGBM and CatBoost allow in-training evaluation. This means that the estimator is evaluated after every round of the training, and that the training is stopped early if it didn't improve in the last early_stopping rounds. This can save the pipeline much time that would otherwise be wasted on an estimator that is unlikely to improve further. Note that this technique is applied both during the BO and at the final fit on the complete training set. There are two ways to apply early stopping on these models: Through the early_stopping key in bo_params . This approach applies early stopping to all models in the pipeline and allows the input of a fraction of the total number of rounds. Filling the early_stopping_rounds parameter directly in est_params . Don't forget to add _fit to the parameter to call it from the fit method. After fitting, the model will get the evals attribute, a dictionary of the train and test performances per round (also if early stopping wasn't applied). Click here for an example using early stopping. Tip Use the plot_evals method to plot the in-training evaluation on the train and test set.","title":"Early stopping"},{"location":"user_guide/#successive-halving","text":"Successive halving is a bandit-based algorithm that fits N models to 1/N of the data. The best half are selected to go to the next iteration where the process is repeated. This continues until only one model remains, which is fitted on the complete dataset. Beware that a model's performance can depend greatly on the amount of data on which it is trained. For this reason, we recommend only to use this technique with similar models, e.g. only using tree-based models. Use successive halving through the SuccessiveHalvingClassifier / SuccessiveHalvingRegressor classes or from atom via the successive_halving method. Consecutive runs of the same model are saved with the model's acronym followed by the number of models in the run. For example, a Random Forest in a run with 4 models would become model RF4 . Click here for a successive halving example. Tip Use the plot_successive_halving method to see every model's performance per iteration of the successive halving.","title":"Successive halving"},{"location":"user_guide/#train-sizing","text":"When training models, there is usually a trade-off between model performance and computation time that is regulated by the number of samples in the training set. Train sizing can be used to create insights in this trade-off and help determine the optimal size of the training set, fitting the models multiple times, ever increasing the number of samples in the training set. Use train sizing through the TrainSizingClassifier / TrainSizingRegressor classes or from atom via the train_sizing method. The number of iterations and the number of samples per training can be specified with the train_sizes parameter. Consecutive runs of the same model are saved with the model's acronym followed by the fraction of rows in the training set (the . is removed from the fraction!). For example, a Random Forest in a run with 80% of the training samples would become model RF08 . Click here for a train sizing example. Tip Use the plot_learning_curve method to see the model's performance per size of the training set.","title":"Train sizing"},{"location":"user_guide/#voting","text":"The idea behind Voting is to combine the predictions of conceptually different models to make new predictions. Such a technique can be useful for a set of equally well performing models in order to balance out their individual weaknesses. Read more in sklearn's documentation . A Voting model is created from a trainer through the voting method. The Voting model is added automatically to the list of models in the pipeline, under the Vote acronym. Although similar, this model is different from the VotingClassifier and VotingRegressor estimators from sklearn. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. Plots that require an estimator object will raise an exception. The Voting class has the same prediction attributes and prediction methods as other models. The predict_proba , predict_log_proba , decision_function and score methods return the average predictions (soft voting) over the models in the instance. Note that these methods will raise an exception if not all estimators in the Voting instance have the specified method. The predict method returns the majority vote (hard voting). The scoring method also returns the average scoring for the selected metric over the models. Click here for a voting example. Warning Although it is possible to include models from different branches in the same Voting instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods.","title":"Voting"},{"location":"user_guide/#stacking","text":"Stacking is a method for combining estimators to reduce their biases. More precisely, the predictions of each individual estimator are stacked together and used as input to a final estimator to compute the prediction. Read more in sklearn's documentation . A Stacking model is created from a trainer through the stacking method. The Stacking model is added automatically to the list of models in the pipeline, under the Stack acronym. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. The prediction methods, the scoring method and the plot methods that require an estimator object will use the Voting's final estimator, under the estimator attribute. Click here for a stacking example. Warning Although it is possible to include models from different branches in the same Stacking instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods.","title":"Stacking"},{"location":"user_guide/#predicting","text":"After running a successful pipeline, it is possible you would like to apply all used transformations onto new data, or make predictions using one of the trained models. Just like a sklearn estimator, you can call the prediction methods from a fitted trainer, e.g. atom.predict(X) . Calling the method without specifying a model will use the winning model in the pipeline (under attribute winner ). To use a different model, simply call the method from a model, e.g. atom.KNN.predict(X) . All prediction methods transform the provided data through the data cleaning and feature engineering transformers before making the predictions. By default, this excludes outlier handling and balancing the dataset since these steps should only be applied on the training set. Use the method's kwargs to select which transformations to use in every call. The available prediction methods are a selection of the most common methods for estimators in sklearn's API: transform Transform new data through all transformers in a branch. predict Transform new data through all transformers in a branch and return class predictions. predict_proba Transform new data through all transformers in a branch and return class predictions. predict_log_proba Transform new data through all transformers in a branch and return class log-probabilities. decision_function Transform new data through all transformers in a branch and return predicted confidence scores. score Transform new data through all transformers in a branch and return the model's score. Except for transform, the prediction methods can be calculated on the train and test set. You can access them through the model's prediction attributes, e.g. atom.mnb.predict_train or atom.mnb.predict_test . Keep in mind that the results are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Note Many of the plots use the prediction attributes. This can considerably increase the size of the class for large datasets. Use the reset_predictions method if you need to free some memory!","title":"Predicting"},{"location":"user_guide/#plots","text":"After fitting the models to the data, it's time to analyze the results. ATOM provides many plotting methods to compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib , seaborn and shap for plotting. The plot methods can be called from a training directly, e.g. atom.plot_roc() , or from one of the models, e.g. atom.LGB.plot_roc() . If called from training , it will make the plot for all models in the pipeline. This can be useful to compare the results of multiple models. If called from a model, it will make the plot for only that model. Use this option if you want information just for that specific model or to make a plot less crowded.","title":"Plots"},{"location":"user_guide/#parameters","text":"Apart from the plot-specific parameters they may have, all plots have four parameters in common: The title parameter allows you to add a custom title to the plot. The figsize parameter adjust the plot's size. The filename parameter is used to save the plot. The display parameter determines whether the plot is rendered.","title":"Parameters"},{"location":"user_guide/#aesthetics","text":"The plot aesthetics can be customized using the plot attributes, e.g. atom.style = \"white\" . These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. ATOM's default values are: style: \"darkgrid\" palette: \"GnBu_r_d\" title_fontsize: 20 label_fontsize: 16 tick_fontsize: 12 Use the reset_aesthetics method to reset all the aesthetics to their default value.","title":"Aesthetics"},{"location":"user_guide/#canvas","text":"Sometimes it might be desirable to draw multiple plots side by side in order to be able to compare them easier. Use the atom's canvas method for this. The canvas method is a @contextmanager , i.e. it is used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot could become too messy. atom = ATOMClassifier(X, y) atom.run([\"xgb\", \"lgb\"], n_calls=0) with atom.canvas(2, 2, title=\"XGBoost vs LightGBM\", filename=\"canvas\"): atom.xgb.plot_roc(dataset=\"both\", title=\"ROC - XGBoost\") atom.lgb.plot_roc(dataset=\"both\", title=\"ROC - LightGBM\") atom.xgb.plot_prc(dataset=\"both\", title=\"PRC - XGBoost\") atom.lgb.plot_prc(dataset=\"both\", title=\"PRC - LightGBM\")","title":"Canvas"},{"location":"user_guide/#shap","text":"The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot , beeswarm_plot , decision_plot , force_plot , heatmap_plot , scatter_plot and waterfall_plot . Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot() . Info You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot .","title":"SHAP"},{"location":"user_guide/#available-plots","text":"A list of available plots can be find hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand. plot_correlation Plot the data's correlation matrix. plot_scatter_matrix Plot the data's scatter matrix. plot_qq Plot a quantile-quantile plot. plot_distribution Plot column distributions. plot_pipeline Plot a diagram of every estimator in atom's pipeline. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per components. plot_rfecv Plot the RFECV results. plot_successive_halving Plot of the models\" scores per iteration of the successive halving. plot_learning_curve Plot the model's learning curve. plot_results Plot a boxplot of the bagging's results. plot_bo Plot the bayesian optimization scoring. plot_evals Plot evaluation curves for the train and test set. plot_roc Plot the Receiver Operating Characteristics curve. plot_prc Plot the precision-recall curve. plot_permutation_importance Plot the feature permutation importance of models. plot_feature_importance Plot a tree-based model's feature importance. plot_partial_dependence Plot the partial dependence of features. plot_errors Plot a model's prediction errors. plot_residuals Plot a model's residuals. plot_confusion_matrix Plot a model's confusion matrix. plot_threshold Plot metric performances against threshold values. plot_probabilities Plot the probability distribution of the classes in the target column. plot_calibration Plot the calibration curve for a binary classifier. plot_gains Plot the cumulative gains curve. plot_lift Plot the lift curve. bar_plot Plot SHAP's bar plot. beeswarm_plot Plot SHAP's beeswarm plot. decision_plot Plot SHAP's decision plot. force_plot Plot SHAP's force plot. heatmap_plot Plot SHAP's heatmap plot. scatter_plot Plot SHAP's scatter plot. waterfall_plot Plot SHAP's waterfall plot.","title":"Available plots"},{"location":"API/ATOM/ATOMLoader/","text":"ATOMLoader function ATOMLoader (filename, data=None, transform_data=True, verbose=None) [source] Load a class instance from a pickle file. If the file is a trainer that was saved using save_data=False , you can load new data into it. For atom pickles, you can also apply all data transformations in the pipeline to the data. Parameters: filename: str Name of the pickle file to load. data: tuple of indexables or None, optional (default=None) Tuple containing the features and target data. Only use this parameter if the file is a trainer that was saved using save_data=False (see the save method). Allowed formats are: X or X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). If no y is provided, the last column is used as target. y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). transform_data: bool, optional (default=True) If False, the data is left as provided. If True, it is transformed through all the steps in the instance's pipeline. This parameter is ignored if the loaded file is not an atom pickle. verbose: int or None, optional (default=None) Verbosity level of the transformations applied on the new data. If None, use the verbosity from the loaded instance. This parameter is ignored if transform_data=False . Example from atom import ATOMClassifier, ATOMLoader # Save an atom instance to a pickle file atom = ATOMClassifier(X, y) atom.encode(strategy=\"Helmert\", max_onehot=12) atom.run(\"LR\", metric=\"AP\", n_calls=25, n_initial_points=10) atom.save(\"atom_lr\", save_data=False) # Load the class and add the transformed data to the new instance atom_2 = ATOMLoader(\"atom_lr\", data=(X, y), verbose=0)","title":"ATOMLoader"},{"location":"API/ATOM/ATOMLoader/#atomloader","text":"function ATOMLoader (filename, data=None, transform_data=True, verbose=None) [source] Load a class instance from a pickle file. If the file is a trainer that was saved using save_data=False , you can load new data into it. For atom pickles, you can also apply all data transformations in the pipeline to the data. Parameters: filename: str Name of the pickle file to load. data: tuple of indexables or None, optional (default=None) Tuple containing the features and target data. Only use this parameter if the file is a trainer that was saved using save_data=False (see the save method). Allowed formats are: X or X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). If no y is provided, the last column is used as target. y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). transform_data: bool, optional (default=True) If False, the data is left as provided. If True, it is transformed through all the steps in the instance's pipeline. This parameter is ignored if the loaded file is not an atom pickle. verbose: int or None, optional (default=None) Verbosity level of the transformations applied on the new data. If None, use the verbosity from the loaded instance. This parameter is ignored if transform_data=False .","title":"ATOMLoader"},{"location":"API/ATOM/ATOMLoader/#example","text":"from atom import ATOMClassifier, ATOMLoader # Save an atom instance to a pickle file atom = ATOMClassifier(X, y) atom.encode(strategy=\"Helmert\", max_onehot=12) atom.run(\"LR\", metric=\"AP\", n_calls=25, n_initial_points=10) atom.save(\"atom_lr\", save_data=False) # Load the class and add the transformed data to the new instance atom_2 = ATOMLoader(\"atom_lr\", data=(X, y), verbose=0)","title":"Example"},{"location":"API/ATOM/atomclassifier/","text":"ATOMClassifier class atom.api. ATOMClassifier (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMClassifier is ATOM's wrapper for binary and multiclass classification tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMClassifier instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Magic methods The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset. Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. mapping: dict Target classes mapped to their respective encoded integer. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. classes: pd.DataFrame Distribution of classes per data set. n_classes: int Number of classes in the target column. Utility attributes Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Utility methods The ATOMClassifier class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. get_class_weight Return class weights for a balanced dataset. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's classifier. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Data cleaning ATOMClassifier provides data cleaning methods to scale your features and handle missing values, categorical columns, outliers and unbalanced datasets. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. balance Balance the target classes in the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. method balance (strategy=\"ADASYN\", **kwargs) [source] Balance the number of samples per target class in the target column. Only the training set is balanced in order to maintain the original distribution of target classes in the test set. See Balancer for a description of the parameters. Feature engineering To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified). Training The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectClassifier instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingClassifier instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingClassifier instance. Example from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y=True) # Initialize class atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", max_sigma=2) atom.balance(strategy=\"smote\", sampling_strategy=0.7) # Fit the models to the data atom.run( models=[\"QDA\", \"CatB\"], metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_roc(figsize=(9, 6), filename=\"roc.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"LR\", metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"ATOMClassifier"},{"location":"API/ATOM/atomclassifier/#atomclassifier","text":"class atom.api. ATOMClassifier (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMClassifier is ATOM's wrapper for binary and multiclass classification tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMClassifier instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"ATOMClassifier"},{"location":"API/ATOM/atomclassifier/#magic-methods","text":"The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset.","title":"Magic methods"},{"location":"API/ATOM/atomclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/ATOM/atomclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. mapping: dict Target classes mapped to their respective encoded integer. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. classes: pd.DataFrame Distribution of classes per data set. n_classes: int Number of classes in the target column.","title":"Data attributes"},{"location":"API/ATOM/atomclassifier/#utility-attributes","text":"Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/ATOM/atomclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/ATOM/atomclassifier/#utility-methods","text":"The ATOMClassifier class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. get_class_weight Return class weights for a balanced dataset. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's classifier. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Utility methods"},{"location":"API/ATOM/atomclassifier/#data-cleaning","text":"ATOMClassifier provides data cleaning methods to scale your features and handle missing values, categorical columns, outliers and unbalanced datasets. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. balance Balance the target classes in the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. method balance (strategy=\"ADASYN\", **kwargs) [source] Balance the number of samples per target class in the target column. Only the training set is balanced in order to maintain the original distribution of target classes in the test set. See Balancer for a description of the parameters.","title":"Data cleaning"},{"location":"API/ATOM/atomclassifier/#feature-engineering","text":"To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified).","title":"Feature engineering"},{"location":"API/ATOM/atomclassifier/#training","text":"The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectClassifier instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingClassifier instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingClassifier instance.","title":"Training"},{"location":"API/ATOM/atomclassifier/#example","text":"from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y=True) # Initialize class atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", max_sigma=2) atom.balance(strategy=\"smote\", sampling_strategy=0.7) # Fit the models to the data atom.run( models=[\"QDA\", \"CatB\"], metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_roc(figsize=(9, 6), filename=\"roc.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"LR\", metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"Example"},{"location":"API/ATOM/atommodel/","text":"ATOMModel function ATOMModel (estimator, acronym=None, fullname=None, needs_scaling=False) [source] Convert an estimator to a model that can be ingested by ATOM. Parameters: estimator: class Model's estimator. Can be a class or an instance. acronym: str or None, optional (default=None) Model's acronym. Used to call the model from the trainer. If None, the capital letters in the estimator's __name__ are used (if 2 or more, else it uses the entire name). fullname: str or None, optional (default=None) Full model's name. If None, the estimator's __name__ is used. needs_scaling: bool, optional (default=False) Whether the model needs scaled features. Can not be True for deep learning datasets. Example from atom import ATOMRegressor, ATOMModel from sklearn.linear_model import HuberRegressor model = ATOMModel(HuberRegressor, name=\"hub\", fullname=\"Huber\", needs_scaling=True) atom = ATOMRegressor(X, y) atom.run(model) atom.hub.predict(X_new)","title":"ATOMModel"},{"location":"API/ATOM/atommodel/#atommodel","text":"function ATOMModel (estimator, acronym=None, fullname=None, needs_scaling=False) [source] Convert an estimator to a model that can be ingested by ATOM. Parameters: estimator: class Model's estimator. Can be a class or an instance. acronym: str or None, optional (default=None) Model's acronym. Used to call the model from the trainer. If None, the capital letters in the estimator's __name__ are used (if 2 or more, else it uses the entire name). fullname: str or None, optional (default=None) Full model's name. If None, the estimator's __name__ is used. needs_scaling: bool, optional (default=False) Whether the model needs scaled features. Can not be True for deep learning datasets.","title":"ATOMModel"},{"location":"API/ATOM/atommodel/#example","text":"from atom import ATOMRegressor, ATOMModel from sklearn.linear_model import HuberRegressor model = ATOMModel(HuberRegressor, name=\"hub\", fullname=\"Huber\", needs_scaling=True) atom = ATOMRegressor(X, y) atom.run(model) atom.hub.predict(X_new)","title":"Example"},{"location":"API/ATOM/atomregressor/","text":"ATOMRegressor class atom.api. ATOMRegressor (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Magic methods The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset. Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. Utility attributes Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Utility methods The ATOMRegressor class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's regressor. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Data cleaning ATOMRegressor provides data cleaning methods to scale your features and handle missing values, categorical columns and outliers. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. Feature engineering To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified). Training The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectRegressor instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingRegressor instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingRegressor instance. Example from sklearn.datasets import load_boston from atom import ATOMRegressor X, y = load_boston(return_X_y=True) # Initialize class atom = ATOMRegressor(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2, include_target=True) # Fit the models to the data atom.run( models=[\"OLS\", \"BR\", \"CatB\"], metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_errors(figsize=(9, 6), filename=\"errors.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"MLP\", metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"ATOMRegressor"},{"location":"API/ATOM/atomregressor/#atomregressor","text":"class atom.api. ATOMRegressor (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"ATOMRegressor"},{"location":"API/ATOM/atomregressor/#magic-methods","text":"The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset.","title":"Magic methods"},{"location":"API/ATOM/atomregressor/#attributes","text":"","title":"Attributes"},{"location":"API/ATOM/atomregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers.","title":"Data attributes"},{"location":"API/ATOM/atomregressor/#utility-attributes","text":"Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/ATOM/atomregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/ATOM/atomregressor/#utility-methods","text":"The ATOMRegressor class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's regressor. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Utility methods"},{"location":"API/ATOM/atomregressor/#data-cleaning","text":"ATOMRegressor provides data cleaning methods to scale your features and handle missing values, categorical columns and outliers. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters.","title":"Data cleaning"},{"location":"API/ATOM/atomregressor/#feature-engineering","text":"To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified).","title":"Feature engineering"},{"location":"API/ATOM/atomregressor/#training","text":"The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectRegressor instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingRegressor instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingRegressor instance.","title":"Training"},{"location":"API/ATOM/atomregressor/#example","text":"from sklearn.datasets import load_boston from atom import ATOMRegressor X, y = load_boston(return_X_y=True) # Initialize class atom = ATOMRegressor(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2, include_target=True) # Fit the models to the data atom.run( models=[\"OLS\", \"BR\", \"CatB\"], metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_errors(figsize=(9, 6), filename=\"errors.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"MLP\", metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"Example"},{"location":"API/data_cleaning/balancer/","text":"Balancer class atom.data_cleaning. Balancer (strategy=\"ADASYN\", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide . Parameters: strategy: str, optional (default=\"ADASYN\") Type of algorithm to use for oversampling or undersampling. Choose from one of the estimators available in the imbalanced-learn package. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's classes attribute for an overview of the target class distribution per data set. Attributes Attributes: : imblearn estimator Estimator instance (lowercase strategy) used to oversample or undersample the data, e.g. balancer.adasyn for the default strategy. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Methods get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Balancer Estimator instance. method transform (X, y) [source] Oversample or undersample the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.balance(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) or from atom.data_cleaning import Balancer balancer = Balancer(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) X_train, y_train = balancer.transform(X_train, y_train)","title":"Balancer"},{"location":"API/data_cleaning/balancer/#balancer","text":"class atom.data_cleaning. Balancer (strategy=\"ADASYN\", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide . Parameters: strategy: str, optional (default=\"ADASYN\") Type of algorithm to use for oversampling or undersampling. Choose from one of the estimators available in the imbalanced-learn package. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's classes attribute for an overview of the target class distribution per data set.","title":"Balancer"},{"location":"API/data_cleaning/balancer/#attributes","text":"Attributes: : imblearn estimator Estimator instance (lowercase strategy) used to oversample or undersample the data, e.g. balancer.adasyn for the default strategy. mapping: dict Dictionary of the target values mapped to their respective encoded integer.","title":"Attributes"},{"location":"API/data_cleaning/balancer/#methods","text":"get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Balancer Estimator instance. method transform (X, y) [source] Oversample or undersample the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column.","title":"Methods"},{"location":"API/data_cleaning/balancer/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.balance(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) or from atom.data_cleaning import Balancer balancer = Balancer(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) X_train, y_train = balancer.transform(X_train, y_train)","title":"Example"},{"location":"API/data_cleaning/cleaner/","text":"Cleaner class atom.data_cleaning. Cleaner (prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True, strip_categorical=True, drop_duplicates=False, missing_target=True, encode_target=True, verbose=0, logger=None) [source] Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. This class can be accessed from atom through the clean method. Read more in the user guide . Parameters: prohibited_types: str, sequence or None, optional (default=None) Columns with these types are dropped from the dataset. maximum_cardinality: bool, optional (default=True) Whether to drop categorical columns with maximum cardinality, i.e. the number of unique values is equal to the number of instances. Usually the case for names, IDs, etc... minimum_cardinality: bool, optional (default=True) Whether to drop columns with minimum cardinality, i.e. all values in the column are the same. strip_categorical: bool, optional (default=True) Whether to strip the spaces from the categorical columns. drop_duplicates: bool, optional (default=False) Whether to drop duplicate rows. Only the first occurrence of every duplicated row is kept. missing_target: bool, optional (default=True) Whether to drop rows with missing values in the target column. Is ignored if y is not provided. encode_target: bool, optional (default=True) Whether to Label-encode the target column. Is ignored if y is not provided. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Attributes Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Only available if encode_target=True. Methods fit_transform Same as transform. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Cleaner Estimator instance. method transform (X, y=None) [source] Apply the data cleaning steps on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean(maximum_cardinality=False) or from atom.data_cleaning import Cleaner cleaner = Cleaner(maximum_cardinality=False) X, y = cleaner.transform(X, y)","title":"Cleaner"},{"location":"API/data_cleaning/cleaner/#cleaner","text":"class atom.data_cleaning. Cleaner (prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True, strip_categorical=True, drop_duplicates=False, missing_target=True, encode_target=True, verbose=0, logger=None) [source] Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. This class can be accessed from atom through the clean method. Read more in the user guide . Parameters: prohibited_types: str, sequence or None, optional (default=None) Columns with these types are dropped from the dataset. maximum_cardinality: bool, optional (default=True) Whether to drop categorical columns with maximum cardinality, i.e. the number of unique values is equal to the number of instances. Usually the case for names, IDs, etc... minimum_cardinality: bool, optional (default=True) Whether to drop columns with minimum cardinality, i.e. all values in the column are the same. strip_categorical: bool, optional (default=True) Whether to strip the spaces from the categorical columns. drop_duplicates: bool, optional (default=False) Whether to drop duplicate rows. Only the first occurrence of every duplicated row is kept. missing_target: bool, optional (default=True) Whether to drop rows with missing values in the target column. Is ignored if y is not provided. encode_target: bool, optional (default=True) Whether to Label-encode the target column. Is ignored if y is not provided. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation.","title":"Cleaner"},{"location":"API/data_cleaning/cleaner/#attributes","text":"Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Only available if encode_target=True.","title":"Attributes"},{"location":"API/data_cleaning/cleaner/#methods","text":"fit_transform Same as transform. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Cleaner Estimator instance. method transform (X, y=None) [source] Apply the data cleaning steps on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/cleaner/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean(maximum_cardinality=False) or from atom.data_cleaning import Cleaner cleaner = Cleaner(maximum_cardinality=False) X, y = cleaner.transform(X, y)","title":"Example"},{"location":"API/data_cleaning/encoder/","text":"Encoder class atom.data_cleaning. Encoder (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None, verbose=0, logger=None, **kwargs) [source] Perform encoding of categorical features. The encoding type depends on the number of classes in the column: If n_classes=2, use Ordinal-encoding. If 2 < n_classes <= max_onehot , use OneHot-encoding. If n_classes > max_onehot , use strategy -encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. An error is raised if it encounters missing values or unknown classes when transforming. This class can be accessed from atom through the encode method. Read more in the user guide . Parameters: strategy: str, optional (default=\"LeaveOneOut\") Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for: OneHotEncoder: Use the max_onehot parameter. HashingEncoder: Incompatibility of APIs. max_onehot: int or None, optional (default=10) Maximum number of unique values in a feature to perform one-hot-encoding. If None, it will always use strategy when n_unique > 2. frac_to_other: float, optional (default=None) Classes with less occurrences than n_rows * frac_to_other are replaced with the string other . If None, skip this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's categorical attribute for a list of the categorical columns in the dataset. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: Encoder Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Encoder Estimator instance. method transform (X, y=None) [source] Encode the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.encode(strategy=\"CatBoost\", max_onehot=5) or from atom.data_cleaning import Encoder encoder = Encoder(strategy=\"CatBoost\", max_onehot=5) encoder.fit(X_train, y_train) X = encoder.transform(X)","title":"Encoder"},{"location":"API/data_cleaning/encoder/#encoder","text":"class atom.data_cleaning. Encoder (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None, verbose=0, logger=None, **kwargs) [source] Perform encoding of categorical features. The encoding type depends on the number of classes in the column: If n_classes=2, use Ordinal-encoding. If 2 < n_classes <= max_onehot , use OneHot-encoding. If n_classes > max_onehot , use strategy -encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. An error is raised if it encounters missing values or unknown classes when transforming. This class can be accessed from atom through the encode method. Read more in the user guide . Parameters: strategy: str, optional (default=\"LeaveOneOut\") Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for: OneHotEncoder: Use the max_onehot parameter. HashingEncoder: Incompatibility of APIs. max_onehot: int or None, optional (default=10) Maximum number of unique values in a feature to perform one-hot-encoding. If None, it will always use strategy when n_unique > 2. frac_to_other: float, optional (default=None) Classes with less occurrences than n_rows * frac_to_other are replaced with the string other . If None, skip this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's categorical attribute for a list of the categorical columns in the dataset.","title":"Encoder"},{"location":"API/data_cleaning/encoder/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: Encoder Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Encoder Estimator instance. method transform (X, y=None) [source] Encode the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set.","title":"Methods"},{"location":"API/data_cleaning/encoder/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.encode(strategy=\"CatBoost\", max_onehot=5) or from atom.data_cleaning import Encoder encoder = Encoder(strategy=\"CatBoost\", max_onehot=5) encoder.fit(X_train, y_train) X = encoder.transform(X)","title":"Example"},{"location":"API/data_cleaning/imputer/","text":"Imputer class atom.data_cleaning. Imputer (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, verbose=0, logger=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the missing attribute to customize what are considered \"missing values\". This class can be accessed from atom through the impute method. Read more in the user guide . Parameters: strat_num: str, int or float, optional (default=\"drop\") Imputing strategy for numerical columns. Choose from: \"drop\": Drop rows containing missing values. \"mean\": Impute with mean of column. \"median\": Impute with median of column. \"knn\": Impute using a K-Nearest Neighbors approach. \"most_frequent\": Impute with most frequent value. int or float: Impute with provided numerical value. strat_cat: str, optional (default=\"drop\") Imputing strategy for categorical columns. Choose from: \"drop\": Drop rows containing missing values. \"most_frequent\": Impute with most frequent value. str: Impute with provided string. min_frac_rows: float or None, optional (default=None) Minimum fraction of non-missing values in a row (if less, the row is removed). If None, ignore this step. min_frac_cols: float or None, optional (default=None) Minimum fraction of non-missing values in a column (if less, the column is removed). If None, ignore this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's nans attribute for an overview of the number of missing values per column. Attributes Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Imputer Fitted instance of self. method fit_transform (X, y=None) [source] Fit the Imputer and return the imputed data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: imputer Estimator instance. method transform (X, y=None) [source] Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,) Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) or from atom.data_cleaning import Imputer imputer = Imputer(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) imputer.fit(X_train, y_train) X = imputer.transform(X)","title":"Imputer"},{"location":"API/data_cleaning/imputer/#imputer","text":"class atom.data_cleaning. Imputer (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, verbose=0, logger=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the missing attribute to customize what are considered \"missing values\". This class can be accessed from atom through the impute method. Read more in the user guide . Parameters: strat_num: str, int or float, optional (default=\"drop\") Imputing strategy for numerical columns. Choose from: \"drop\": Drop rows containing missing values. \"mean\": Impute with mean of column. \"median\": Impute with median of column. \"knn\": Impute using a K-Nearest Neighbors approach. \"most_frequent\": Impute with most frequent value. int or float: Impute with provided numerical value. strat_cat: str, optional (default=\"drop\") Imputing strategy for categorical columns. Choose from: \"drop\": Drop rows containing missing values. \"most_frequent\": Impute with most frequent value. str: Impute with provided string. min_frac_rows: float or None, optional (default=None) Minimum fraction of non-missing values in a row (if less, the row is removed). If None, ignore this step. min_frac_cols: float or None, optional (default=None) Minimum fraction of non-missing values in a column (if less, the column is removed). If None, ignore this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's nans attribute for an overview of the number of missing values per column.","title":"Imputer"},{"location":"API/data_cleaning/imputer/#attributes","text":"Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators.","title":"Attributes"},{"location":"API/data_cleaning/imputer/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Imputer Fitted instance of self. method fit_transform (X, y=None) [source] Fit the Imputer and return the imputed data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: imputer Estimator instance. method transform (X, y=None) [source] Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,) Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/imputer/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) or from atom.data_cleaning import Imputer imputer = Imputer(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) imputer.fit(X_train, y_train) X = imputer.transform(X)","title":"Example"},{"location":"API/data_cleaning/pruner/","text":"Pruner class atom.data_cleaning. Pruner (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, verbose=0, logger=None, **kwargs) [source] Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed from atom through the prune method. Read more in the user guide . Parameters: strategy: str, optional (default=\"z-score\") Strategy with which to select the outliers. Choose from: \"z-score\": Uses the z-score of each data value. \"iForest\": Uses an Isolation Forest . \"EE\": Uses an Elliptic Envelope . \"LOF\": Uses a Local Outlier Factor . \"SVM\": Uses a One-class SVM . \"DBSCAN\": Uses DBSCAN clustering. \"OPTICS\": Uses OPTICS clustering. method: int, float or str, optional (default=\"drop\") Method to apply on the outliers. Only the z-score strategy accepts another method than \"drop\". Choose from: \"drop\": Drop any sample with outlier values. \"min_max\": Replace the outlier with the min or max of the column. Any numerical value with which to replace the outliers. max_sigma: int or float, optional (default=3) Maximum allowed standard deviations from the mean of the column. If more, it is considered an outlier. Only if strategy=\"z-score\". include_target: bool, optional (default=False) Whether to include the target column in the transformation. This can be useful for regression tasks. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's outliers attribute for an overview of the number of outlier values per column. Attributes Attributes: : sklearn estimator Estimator instance (lowercase strategy) used to prune the data, e.g. pruner.iforest for the isolation forest strategy. Methods get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Outliers Estimator instance. method transform (X, y=None) [source] Apply the outlier strategy on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.prune(strategy=\"z-score\", max_sigma=2, include_target=True) or from atom.data_cleaning import Pruner pruner = Pruner(strategy=\"z-score\", max_sigma=2, include_target=True) X_train, y_train = pruner.transform(X_train, y_train)","title":"Pruner"},{"location":"API/data_cleaning/pruner/#pruner","text":"class atom.data_cleaning. Pruner (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, verbose=0, logger=None, **kwargs) [source] Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed from atom through the prune method. Read more in the user guide . Parameters: strategy: str, optional (default=\"z-score\") Strategy with which to select the outliers. Choose from: \"z-score\": Uses the z-score of each data value. \"iForest\": Uses an Isolation Forest . \"EE\": Uses an Elliptic Envelope . \"LOF\": Uses a Local Outlier Factor . \"SVM\": Uses a One-class SVM . \"DBSCAN\": Uses DBSCAN clustering. \"OPTICS\": Uses OPTICS clustering. method: int, float or str, optional (default=\"drop\") Method to apply on the outliers. Only the z-score strategy accepts another method than \"drop\". Choose from: \"drop\": Drop any sample with outlier values. \"min_max\": Replace the outlier with the min or max of the column. Any numerical value with which to replace the outliers. max_sigma: int or float, optional (default=3) Maximum allowed standard deviations from the mean of the column. If more, it is considered an outlier. Only if strategy=\"z-score\". include_target: bool, optional (default=False) Whether to include the target column in the transformation. This can be useful for regression tasks. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's outliers attribute for an overview of the number of outlier values per column.","title":"Pruner"},{"location":"API/data_cleaning/pruner/#attributes","text":"Attributes: : sklearn estimator Estimator instance (lowercase strategy) used to prune the data, e.g. pruner.iforest for the isolation forest strategy.","title":"Attributes"},{"location":"API/data_cleaning/pruner/#methods","text":"get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Outliers Estimator instance. method transform (X, y=None) [source] Apply the outlier strategy on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/pruner/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.prune(strategy=\"z-score\", max_sigma=2, include_target=True) or from atom.data_cleaning import Pruner pruner = Pruner(strategy=\"z-score\", max_sigma=2, include_target=True) X_train, y_train = pruner.transform(X_train, y_train)","title":"Example"},{"location":"API/data_cleaning/scaler/","text":"Scaler class atom.data_cleaning. Scaler (strategy=\"standard\", verbose=0, logger=None) [source] This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the scale method. Read more in the user guide . Parameters: strategy: str, optional (default=\"standard\") Scaler object with which to scale the data. Options are: standard: Scale with StandardScaler . minmax: Scale with MinMaxScaler . maxabs: Scale with MaxAbsScaler . robust: Scale with RobustScaler . verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's scaled attribute to check if the feature set is scaled. Attributes Attributes: scaler: sklearn transformer Estimator's instance with which the data is scaled. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Compute the mean and std to be used for later scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Scaler Fitted instance of self. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Scaler Estimator instance. method transform (X, y=None) [source] Perform standardization by centering and scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.scale() or from atom.data_cleaning import Scaler scaler = Scaler() scaler.fit(X_train) X = scaler.transform(X)","title":"Scaler"},{"location":"API/data_cleaning/scaler/#scaler","text":"class atom.data_cleaning. Scaler (strategy=\"standard\", verbose=0, logger=None) [source] This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the scale method. Read more in the user guide . Parameters: strategy: str, optional (default=\"standard\") Scaler object with which to scale the data. Options are: standard: Scale with StandardScaler . minmax: Scale with MinMaxScaler . maxabs: Scale with MaxAbsScaler . robust: Scale with RobustScaler . verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's scaled attribute to check if the feature set is scaled.","title":"Scaler"},{"location":"API/data_cleaning/scaler/#attributes","text":"Attributes: scaler: sklearn transformer Estimator's instance with which the data is scaled.","title":"Attributes"},{"location":"API/data_cleaning/scaler/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Compute the mean and std to be used for later scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Scaler Fitted instance of self. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Scaler Estimator instance. method transform (X, y=None) [source] Perform standardization by centering and scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set.","title":"Methods"},{"location":"API/data_cleaning/scaler/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.scale() or from atom.data_cleaning import Scaler scaler = Scaler() scaler.fit(X_train) X = scaler.transform(X)","title":"Example"},{"location":"API/feature_engineering/feature_generator/","text":"FeatureGenerator class atom.feature_engineering. FeatureGenerator (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide . Parameters: strategy: str, optional (default=\"DFS\") Strategy to crate new features. Choose from: \"DFS\" to use Deep Feature Synthesis. \"GFG\" or \"genetic\" to use Genetic Feature Generation. n_features: int or None, optional (default=None) Number of newly generated features to add to the dataset (no more than 1% of the population for the genetic strategy). If None, select all created features. generations: int, optional (default=20) Number of generations to evolve. Only for the genetic strategy. population: int, optional (default=500) Number of programs in each generation. Only for the genetic strategy. operators: str, list, tuple or None, optional (default=None) Name of the operators to be used on the features. None to use all. Choose from: \"add\", \"sub\", \"mul\", \"div\", \"sqrt\", \"log\", \"sin\", \"cos\", \"tan\". n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Attributes Attributes: symbolic_transformer: SymbolicTransformer Instance used to calculate the genetic features. Only for the genetic strategy. genetic_features: pd.DataFrame Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include: name: Name of the feature (automatically created). description: Operators used to create this feature. fitness: Fitness score. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureGenerator Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Feature set with the newly generated features. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureGenerator Estimator instance. method transform (X, y=None) [source] Generate new features. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Feature set with the newly generated features. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_generation(strategy=\"genetic\", n_features=3, generations=30, population=400) or from atom.feature_engineering import FeatureGenerator feature_generator = FeatureGenerator(strategy=\"genetic\", n_features=3, generations=30, population=400) feature_generator.fit(X_train, y_train) X = feature_generator.transform(X)","title":"FeatureGenerator"},{"location":"API/feature_engineering/feature_generator/#featuregenerator","text":"class atom.feature_engineering. FeatureGenerator (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide . Parameters: strategy: str, optional (default=\"DFS\") Strategy to crate new features. Choose from: \"DFS\" to use Deep Feature Synthesis. \"GFG\" or \"genetic\" to use Genetic Feature Generation. n_features: int or None, optional (default=None) Number of newly generated features to add to the dataset (no more than 1% of the population for the genetic strategy). If None, select all created features. generations: int, optional (default=20) Number of generations to evolve. Only for the genetic strategy. population: int, optional (default=500) Number of programs in each generation. Only for the genetic strategy. operators: str, list, tuple or None, optional (default=None) Name of the operators to be used on the features. None to use all. Choose from: \"add\", \"sub\", \"mul\", \"div\", \"sqrt\", \"log\", \"sin\", \"cos\", \"tan\". n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task.","title":"FeatureGenerator"},{"location":"API/feature_engineering/feature_generator/#attributes","text":"Attributes: symbolic_transformer: SymbolicTransformer Instance used to calculate the genetic features. Only for the genetic strategy. genetic_features: pd.DataFrame Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include: name: Name of the feature (automatically created). description: Operators used to create this feature. fitness: Fitness score.","title":"Attributes"},{"location":"API/feature_engineering/feature_generator/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureGenerator Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Feature set with the newly generated features. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureGenerator Estimator instance. method transform (X, y=None) [source] Generate new features. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Feature set with the newly generated features.","title":"Methods"},{"location":"API/feature_engineering/feature_generator/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_generation(strategy=\"genetic\", n_features=3, generations=30, population=400) or from atom.feature_engineering import FeatureGenerator feature_generator = FeatureGenerator(strategy=\"genetic\", n_features=3, generations=30, population=400) feature_generator.fit(X_train, y_train) X = feature_generator.transform(X)","title":"Example"},{"location":"API/feature_engineering/feature_selector/","text":"FeatureSelector class atom.feature_engineering. FeatureSelector (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. This class can be accessed from atom through the feature_selection method. Read more in the user guide . Parameters: strategy: string or None, optional (default=None) Feature selection strategy to use. Choose from: None: Do not perform any feature selection algorithm. \"univariate\": Select best features according to a univariate F-test. \"PCA\": Perform principal component analysis. \"SFM\": Select best features according to a model. \"RFE\": Perform recursive feature elimination. \"RFECV\": Perform RFE with cross-validated selection. \"SFS\": Perform Sequential Feature Selection. solver: string, estimator or None, optional (default=None) Solver or model to use for the feature selection strategy. See sklearn's documentation for an extended description of the choices. Select None for the default option per strategy (only for univariate and PCA). for \"univariate\", choose from: \"f_classif\" \"f_regression\" \"mutual_info_classif\" \"mutual_info_regression\" \"chi2\" Any function taking two arrays (X, y), and returning arrays (scores, p-values). See the sklearn documentation . for \"PCA\", choose from: \"auto\" (default) \"full\" \"arpack\" \"randomized\" for \"SFM\", \"RFE\", \"RFECV\" and \"SFS\": The base estimator. For SFM, RFE and RFECV, it should have either a either a feature_importances_ or coef_ attribute after fitting. You can use one of ATOM's predefined models . Add _class or _reg after the model's name to specify a classification or regression task, e.g. solver=\"LGB_reg\" (not necessary if called from an atom instance. No default option. n_features: int, float or None, optional (default=None) Number of features to select. Choose from: if None: Select all features. if < 1: Fraction of the total features to select. if >= 1: Number of features to select. If strategy=\"SFM\" and the threshold parameter is not specified, the threshold is set to -np.inf to select the n_features features. If strategy=\"RFECV\", it's the minimum number of features to select. max_frac_repeated: float or None, optional (default=1.) Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. None to skip this step. max_correlation: float or None, optional (default=1.) Minimum value of the Pearson correlation coefficient to identify correlated features. A value of 1 removes on of 2 equal columns. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. None to skip this step. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Any extra keyword argument for the PCA, SFM, RFE, RFECV and SFS estimators. See the corresponding sklearn documentation for the available options. Info If strategy=\"PCA\", the data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE, RFECV AND SFS strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs. Attributes Utility attributes Attributes: collinear: pd.DataFrame Dataframe of the removed collinear features. Columns include: drop_feature: Name of the feature dropped by the method. correlated feature: Name of the correlated feature(s). correlation_value: Pearson correlation coefficients of the feature pairs. feature_importance: list Remaining features ordered by importance. Only if strategy in [\"univariate\", \"SFM, \"RFE\", \"RFECV\"]. For RFE and RFECV, the importance is extracted from the external estimator fitted on the reduced set. : sklearn estimator Estimator instance (lowercase strategy) used to transform the data, e.g. balancer.pca for the PCA strategy. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per component. plot_rfecv Plot the scores obtained by the estimator on the RFECV. reset_aesthetics Reset the plot aesthetics to their default values. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies all need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureSelector Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. See plot_pca for a description of the parameters. method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. See plot_components for a description of the parameters. method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the scores obtained by the estimator fitted on every subset of the data. See plot_rfecv for a description of the parameters. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureSelector Estimator instance. method transform (X, y=None) [source] Transform the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) atom.plot_pca(filename=\"pca\", figsize=(8, 5)) or from atom.feature_engineering import FeatureSelector feature_selector = FeatureSelector(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) feature_selector.fit(X_train, y_train) X = feature_selector.transform(X, y) feature_selector.plot_pca(filename=\"pca\", figsize=(8, 5))","title":"FeatureSelector"},{"location":"API/feature_engineering/feature_selector/#featureselector","text":"class atom.feature_engineering. FeatureSelector (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. This class can be accessed from atom through the feature_selection method. Read more in the user guide . Parameters: strategy: string or None, optional (default=None) Feature selection strategy to use. Choose from: None: Do not perform any feature selection algorithm. \"univariate\": Select best features according to a univariate F-test. \"PCA\": Perform principal component analysis. \"SFM\": Select best features according to a model. \"RFE\": Perform recursive feature elimination. \"RFECV\": Perform RFE with cross-validated selection. \"SFS\": Perform Sequential Feature Selection. solver: string, estimator or None, optional (default=None) Solver or model to use for the feature selection strategy. See sklearn's documentation for an extended description of the choices. Select None for the default option per strategy (only for univariate and PCA). for \"univariate\", choose from: \"f_classif\" \"f_regression\" \"mutual_info_classif\" \"mutual_info_regression\" \"chi2\" Any function taking two arrays (X, y), and returning arrays (scores, p-values). See the sklearn documentation . for \"PCA\", choose from: \"auto\" (default) \"full\" \"arpack\" \"randomized\" for \"SFM\", \"RFE\", \"RFECV\" and \"SFS\": The base estimator. For SFM, RFE and RFECV, it should have either a either a feature_importances_ or coef_ attribute after fitting. You can use one of ATOM's predefined models . Add _class or _reg after the model's name to specify a classification or regression task, e.g. solver=\"LGB_reg\" (not necessary if called from an atom instance. No default option. n_features: int, float or None, optional (default=None) Number of features to select. Choose from: if None: Select all features. if < 1: Fraction of the total features to select. if >= 1: Number of features to select. If strategy=\"SFM\" and the threshold parameter is not specified, the threshold is set to -np.inf to select the n_features features. If strategy=\"RFECV\", it's the minimum number of features to select. max_frac_repeated: float or None, optional (default=1.) Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. None to skip this step. max_correlation: float or None, optional (default=1.) Minimum value of the Pearson correlation coefficient to identify correlated features. A value of 1 removes on of 2 equal columns. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. None to skip this step. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Any extra keyword argument for the PCA, SFM, RFE, RFECV and SFS estimators. See the corresponding sklearn documentation for the available options. Info If strategy=\"PCA\", the data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE, RFECV AND SFS strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs.","title":"FeatureSelector"},{"location":"API/feature_engineering/feature_selector/#attributes","text":"","title":"Attributes"},{"location":"API/feature_engineering/feature_selector/#utility-attributes","text":"Attributes: collinear: pd.DataFrame Dataframe of the removed collinear features. Columns include: drop_feature: Name of the feature dropped by the method. correlated feature: Name of the correlated feature(s). correlation_value: Pearson correlation coefficients of the feature pairs. feature_importance: list Remaining features ordered by importance. Only if strategy in [\"univariate\", \"SFM, \"RFE\", \"RFECV\"]. For RFE and RFECV, the importance is extracted from the external estimator fitted on the reduced set. : sklearn estimator Estimator instance (lowercase strategy) used to transform the data, e.g. balancer.pca for the PCA strategy.","title":"Utility attributes"},{"location":"API/feature_engineering/feature_selector/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/feature_engineering/feature_selector/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per component. plot_rfecv Plot the scores obtained by the estimator on the RFECV. reset_aesthetics Reset the plot aesthetics to their default values. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies all need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureSelector Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. See plot_pca for a description of the parameters. method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. See plot_components for a description of the parameters. method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the scores obtained by the estimator fitted on every subset of the data. See plot_rfecv for a description of the parameters. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureSelector Estimator instance. method transform (X, y=None) [source] Transform the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set.","title":"Methods"},{"location":"API/feature_engineering/feature_selector/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) atom.plot_pca(filename=\"pca\", figsize=(8, 5)) or from atom.feature_engineering import FeatureSelector feature_selector = FeatureSelector(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) feature_selector.fit(X_train, y_train) X = feature_selector.transform(X, y) feature_selector.plot_pca(filename=\"pca\", figsize=(8, 5))","title":"Example"},{"location":"API/models/adab/","text":"AdaBoost (AdaB) AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on the original dataset and then fits additional copies of the algorithm on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. Corresponding estimators are: AdaBoostClassifier for classification tasks. AdaBoostRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The algorithm parameter is only used with AdaBoostClassifier. The loss parameter is only used with AdaBoostRegressor. The random_state parameter is set equal to that of the trainer. Dimensions: n_estimators: int, default=50 Integer(10, 500, name=\"n_estimators\") learning_rate: float, default=1.0 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") algorithm: str, default=\"SAMME.R\" Categorical([\"SAMME.R\", \"SAMME\"], name=\"algorithm\") loss: str, default=\"linear\" Categorical([\"linear\", \"square\", \"exponential\"], name=\"loss\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.adab.plot_permutation_importance() or atom.adab.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"AdaB\", metric=\"poisson\", est_params={\"algorithm\": \"SAMME.R\"})","title":"AdaBoost"},{"location":"API/models/adab/#adaboost-adab","text":"AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on the original dataset and then fits additional copies of the algorithm on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. Corresponding estimators are: AdaBoostClassifier for classification tasks. AdaBoostRegressor for regression tasks. Read more in sklearn's documentation .","title":"AdaBoost (AdaB)"},{"location":"API/models/adab/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The algorithm parameter is only used with AdaBoostClassifier. The loss parameter is only used with AdaBoostRegressor. The random_state parameter is set equal to that of the trainer. Dimensions: n_estimators: int, default=50 Integer(10, 500, name=\"n_estimators\") learning_rate: float, default=1.0 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") algorithm: str, default=\"SAMME.R\" Categorical([\"SAMME.R\", \"SAMME\"], name=\"algorithm\") loss: str, default=\"linear\" Categorical([\"linear\", \"square\", \"exponential\"], name=\"loss\")","title":"Hyperparameters"},{"location":"API/models/adab/#attributes","text":"","title":"Attributes"},{"location":"API/models/adab/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/adab/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/adab/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/adab/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.adab.plot_permutation_importance() or atom.adab.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/adab/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"AdaB\", metric=\"poisson\", est_params={\"algorithm\": \"SAMME.R\"})","title":"Example"},{"location":"API/models/ard/","text":"Automatic Relevance Determination (ARD) Automatic Relevance Determination is very similar to Bayesian Ridge , but can lead to sparser coefficients. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Corresponding estimators are: ARDRegression for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ard.plot_permutation_importance() or atom.ard.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ARD\", n_calls=20, n_initial_points=7, bagging=5)","title":"Automated Relevance Determination"},{"location":"API/models/ard/#automatic-relevance-determination-ard","text":"Automatic Relevance Determination is very similar to Bayesian Ridge , but can lead to sparser coefficients. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Corresponding estimators are: ARDRegression for regression tasks. Read more in sklearn's documentation .","title":"Automatic Relevance Determination (ARD)"},{"location":"API/models/ard/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\")","title":"Hyperparameters"},{"location":"API/models/ard/#attributes","text":"","title":"Attributes"},{"location":"API/models/ard/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ard/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ard/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ard/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ard.plot_permutation_importance() or atom.ard.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ard/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ARD\", n_calls=20, n_initial_points=7, bagging=5)","title":"Example"},{"location":"API/models/bag/","text":"Bagging (Bag) Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree ), by introducing randomization into its construction procedure and then making an ensemble out of it. Corresponding estimators are: BaggingClassifier for classification tasks. BaggingRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=10 Integer(10, 500, name=\"n_estimators\") max_samples: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_samples\") max_features: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_features\") bootstrap: bool, default=True Categorical([True, False], name=\"bootstrap\") bootstrap_features: bool, default=False Categorical([True, False], name=\"bootstrap_features\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.bag.plot_permutation_importance() or atom.bag.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Bag\")","title":"Bagging"},{"location":"API/models/bag/#bagging-bag","text":"Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree ), by introducing randomization into its construction procedure and then making an ensemble out of it. Corresponding estimators are: BaggingClassifier for classification tasks. BaggingRegressor for regression tasks. Read more in sklearn's documentation .","title":"Bagging (Bag)"},{"location":"API/models/bag/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=10 Integer(10, 500, name=\"n_estimators\") max_samples: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_samples\") max_features: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_features\") bootstrap: bool, default=True Categorical([True, False], name=\"bootstrap\") bootstrap_features: bool, default=False Categorical([True, False], name=\"bootstrap_features\")","title":"Hyperparameters"},{"location":"API/models/bag/#attributes","text":"","title":"Attributes"},{"location":"API/models/bag/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/bag/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/bag/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/bag/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.bag.plot_permutation_importance() or atom.bag.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/bag/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Bag\")","title":"Example"},{"location":"API/models/bnb/","text":"Bernoulli Naive Bayes (BNB) Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli models. Like Multinomial Naive bayes (MNB) , this classifier is suitable for discrete data. The difference is that while MNB works with occurrence counts, BNB is designed for binary/boolean features. Corresponding estimators are: BernoulliNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"BNB\", metric=\"precision\")","title":"Bernoulli Naive Bayes"},{"location":"API/models/bnb/#bernoulli-naive-bayes-bnb","text":"Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli models. Like Multinomial Naive bayes (MNB) , this classifier is suitable for discrete data. The difference is that while MNB works with occurrence counts, BNB is designed for binary/boolean features. Corresponding estimators are: BernoulliNB for classification tasks. Read more in sklearn's documentation .","title":"Bernoulli Naive Bayes (BNB)"},{"location":"API/models/bnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/bnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/bnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/bnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/bnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/bnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/bnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"BNB\", metric=\"precision\")","title":"Example"},{"location":"API/models/br/","text":"Bayesian Ridge (BR) Bayesian regression techniques can be used to include regularization parameters in the estimation procedure: the regularization parameter is not set in a hard sense but tuned to the data at hand. Corresponding estimators are: BayesianRidge for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.br.plot_permutation_importance() or atom.br.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"BR\", n_calls=20, n_initial_points=7, bagging=5)","title":"Bayesian Ridge"},{"location":"API/models/br/#bayesian-ridge-br","text":"Bayesian regression techniques can be used to include regularization parameters in the estimation procedure: the regularization parameter is not set in a hard sense but tuned to the data at hand. Corresponding estimators are: BayesianRidge for regression tasks. Read more in sklearn's documentation .","title":"Bayesian Ridge (BR)"},{"location":"API/models/br/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\")","title":"Hyperparameters"},{"location":"API/models/br/#attributes","text":"","title":"Attributes"},{"location":"API/models/br/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/br/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/br/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/br/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.br.plot_permutation_importance() or atom.br.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/br/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"BR\", n_calls=20, n_initial_points=7, bagging=5)","title":"Example"},{"location":"API/models/catb/","text":"CatBoost (CatB) CatBoost is a machine learning method based on gradient boosting over decision trees. Main advantages of CatBoost: Superior quality when compared with other GBDT models on many datasets. Best in class prediction speed. Corresponding estimators are: CatBoostClassifier for classification tasks. CatBoostRegressor for regression tasks. Read more in CatBoost's documentation . Note CatBoost allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The bootstrap_type parameter is set to \"Bernoulli\" to allow for the subsample parameter. The num_leaves and min_child_samples parameters are not available for the CPU implementation. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_lambda: int, default=0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"CatB\", n_calls=50, bo_params={\"max_time\": 1000, \"early_stopping\": 0.1})","title":"CatBoost"},{"location":"API/models/catb/#catboost-catb","text":"CatBoost is a machine learning method based on gradient boosting over decision trees. Main advantages of CatBoost: Superior quality when compared with other GBDT models on many datasets. Best in class prediction speed. Corresponding estimators are: CatBoostClassifier for classification tasks. CatBoostRegressor for regression tasks. Read more in CatBoost's documentation . Note CatBoost allows early stopping to stop the training of unpromising models prematurely!","title":"CatBoost (CatB)"},{"location":"API/models/catb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The bootstrap_type parameter is set to \"Bernoulli\" to allow for the subsample parameter. The num_leaves and min_child_samples parameters are not available for the CPU implementation. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_lambda: int, default=0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/catb/#attributes","text":"","title":"Attributes"},{"location":"API/models/catb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/catb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/catb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/catb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/catb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"CatB\", n_calls=50, bo_params={\"max_time\": 1000, \"early_stopping\": 0.1})","title":"Example"},{"location":"API/models/catnb/","text":"Categorical Naive Bayes (CatNB) Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features. Corresponding estimators are: CategoricalNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CatNB\")","title":"Categorical Naive Bayes"},{"location":"API/models/catnb/#categorical-naive-bayes-catnb","text":"Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features. Corresponding estimators are: CategoricalNB for classification tasks. Read more in sklearn's documentation .","title":"Categorical Naive Bayes (CatNB)"},{"location":"API/models/catnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/catnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/catnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/catnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/catnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/catnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/catnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CatNB\")","title":"Example"},{"location":"API/models/cnb/","text":"Complement Naive Bayes (CNB) The Complement Naive Bayes classifier was designed to correct the \u201csevere assumptions\u201d made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. Corresponding estimators are: ComplementNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") norm: bool, default=False Categorical([True, False], name=\"norm\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CNB\")","title":"Complement Naive Bayes"},{"location":"API/models/cnb/#complement-naive-bayes-cnb","text":"The Complement Naive Bayes classifier was designed to correct the \u201csevere assumptions\u201d made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. Corresponding estimators are: ComplementNB for classification tasks. Read more in sklearn's documentation .","title":"Complement Naive Bayes (CNB)"},{"location":"API/models/cnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") norm: bool, default=False Categorical([True, False], name=\"norm\")","title":"Hyperparameters"},{"location":"API/models/cnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/cnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/cnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/cnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/cnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/cnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CNB\")","title":"Example"},{"location":"API/models/en/","text":"Elastic Net (EN) Linear least squares with l1 and l2 regularization. Corresponding estimators are: ElasticNet for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.en.plot_permutation_importance() or atom.en.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"EN\", est_params={\"l1_ratio\": 0.75})","title":"Elastic Net"},{"location":"API/models/en/#elastic-net-en","text":"Linear least squares with l1 and l2 regularization. Corresponding estimators are: ElasticNet for regression tasks. Read more in sklearn's documentation .","title":"Elastic Net (EN)"},{"location":"API/models/en/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\")","title":"Hyperparameters"},{"location":"API/models/en/#attributes","text":"","title":"Attributes"},{"location":"API/models/en/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/en/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/en/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/en/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.en.plot_permutation_importance() or atom.en.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/en/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"EN\", est_params={\"l1_ratio\": 0.75})","title":"Example"},{"location":"API/models/et/","text":"Extra-Trees (ET) Extra-Trees use a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Corresponding estimators are: ExtraTreesClassifier for classification tasks. ExtraTreesRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.et.plot_permutation_importance() or atom.et.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ET\", metric=\"MSE\", n_calls=5, n_initial_points=1)","title":"Extra-Trees"},{"location":"API/models/et/#extra-trees-et","text":"Extra-Trees use a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Corresponding estimators are: ExtraTreesClassifier for classification tasks. ExtraTreesRegressor for regression tasks. Read more in sklearn's documentation .","title":"Extra-Trees (ET)"},{"location":"API/models/et/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\")","title":"Hyperparameters"},{"location":"API/models/et/#attributes","text":"","title":"Attributes"},{"location":"API/models/et/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/et/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/et/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/et/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.et.plot_permutation_importance() or atom.et.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/et/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ET\", metric=\"MSE\", n_calls=5, n_initial_points=1)","title":"Example"},{"location":"API/models/gbm/","text":"Gradient Boosting Machine (GBM) A Gradient Boosting Machine builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Corresponding estimators are: GradientBoostingClassifier for classification tasks. GradientBoostingRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. For multiclass classification tasks, the loss parameter is always set to \"deviance\". The alpha parameter is only used when loss = \"huber\" or \"quantile\". The random_state parameter is set equal to that of the trainer. Dimensions: learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") criterion: str, default=\"friedman_mse\" Categorical([\"friedman_mse\", \"mae\", \"mse\"], name=\"criterion\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_depth: int, default=3 Integer(1, 10, name=\"max_depth\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0 Real(0, 0.035, name=\"ccp_alpha\") loss: str binary classifier: default=\"deviance\" Categorical([\"deviance\", \"exponential\"], name=\"loss\") regressor: default=\"ls\" Categorical([\"ls\", \"lad\", \"huber\", \"quantile\"], name=\"loss\") alpha: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"alpha\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() or atom.gbm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GBM\")","title":"Gradient Boosting Machine"},{"location":"API/models/gbm/#gradient-boosting-machine-gbm","text":"A Gradient Boosting Machine builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Corresponding estimators are: GradientBoostingClassifier for classification tasks. GradientBoostingRegressor for regression tasks. Read more in sklearn's documentation .","title":"Gradient Boosting Machine (GBM)"},{"location":"API/models/gbm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. For multiclass classification tasks, the loss parameter is always set to \"deviance\". The alpha parameter is only used when loss = \"huber\" or \"quantile\". The random_state parameter is set equal to that of the trainer. Dimensions: learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") criterion: str, default=\"friedman_mse\" Categorical([\"friedman_mse\", \"mae\", \"mse\"], name=\"criterion\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_depth: int, default=3 Integer(1, 10, name=\"max_depth\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0 Real(0, 0.035, name=\"ccp_alpha\") loss: str binary classifier: default=\"deviance\" Categorical([\"deviance\", \"exponential\"], name=\"loss\") regressor: default=\"ls\" Categorical([\"ls\", \"lad\", \"huber\", \"quantile\"], name=\"loss\") alpha: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"alpha\")","title":"Hyperparameters"},{"location":"API/models/gbm/#attributes","text":"","title":"Attributes"},{"location":"API/models/gbm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gbm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gbm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gbm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() or atom.gbm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gbm/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GBM\")","title":"Example"},{"location":"API/models/gnb/","text":"Gaussian Naive bayes (GNB) Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian. Corresponding estimators are: GaussianNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GNB has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"GNB\")","title":"Gaussian Naive Bayes"},{"location":"API/models/gnb/#gaussian-naive-bayes-gnb","text":"Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian. Corresponding estimators are: GaussianNB for classification tasks. Read more in sklearn's documentation .","title":"Gaussian Naive bayes (GNB)"},{"location":"API/models/gnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GNB has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/gnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/gnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gnb/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"GNB\")","title":"Example"},{"location":"API/models/gp/","text":"Gaussian Process (GP) Gaussian Processes are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations. The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. The disadvantages of Gaussian processes include: They are not sparse, i.e. they use the whole samples/features information to perform the prediction. They lose efficiency in high dimensional spaces, namely when the number of features exceeds a few dozens. Corresponding estimators are: GaussianProcessClassifier for classification tasks. GaussianProcessClassifier for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GP has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GP\", metric=\"medae\")","title":"Gaussian Process"},{"location":"API/models/gp/#gaussian-process-gp","text":"Gaussian Processes are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations. The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. The disadvantages of Gaussian processes include: They are not sparse, i.e. they use the whole samples/features information to perform the prediction. They lose efficiency in high dimensional spaces, namely when the number of features exceeds a few dozens. Corresponding estimators are: GaussianProcessClassifier for classification tasks. GaussianProcessClassifier for regression tasks. Read more in sklearn's documentation .","title":"Gaussian Process (GP)"},{"location":"API/models/gp/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GP has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/gp/#attributes","text":"","title":"Attributes"},{"location":"API/models/gp/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gp/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gp/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gp/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gp/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GP\", metric=\"medae\")","title":"Example"},{"location":"API/models/knn/","text":"K-Nearest Neighbors (KNN) K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest neighbors vote. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: KNeighborsClassifier for classification tasks. KNeighborsRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. Dimensions: n_neighbors: int, default=5 Integer(1, 100, name=\"n_neighbors\") weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.knn.plot_permutation_importance() or atom.knn.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"KNN\", metric=\"ME\", n_calls=20, bo_params={\"max_time\": 1000})","title":"K-Nearest Neighbors"},{"location":"API/models/knn/#k-nearest-neighbors-knn","text":"K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest neighbors vote. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: KNeighborsClassifier for classification tasks. KNeighborsRegressor for regression tasks. Read more in sklearn's documentation .","title":"K-Nearest Neighbors (KNN)"},{"location":"API/models/knn/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. Dimensions: n_neighbors: int, default=5 Integer(1, 100, name=\"n_neighbors\") weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\")","title":"Hyperparameters"},{"location":"API/models/knn/#attributes","text":"","title":"Attributes"},{"location":"API/models/knn/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/knn/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/knn/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/knn/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.knn.plot_permutation_importance() or atom.knn.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/knn/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"KNN\", metric=\"ME\", n_calls=20, bo_params={\"max_time\": 1000})","title":"Example"},{"location":"API/models/ksvm/","text":"Kernel-SVM (kSVM) The implementation of the Kernel (non-linear) Support Vector Machine is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using a Linear Support Vector Machine or a Stochastic Gradient descent model instead. The multiclass support is handled according to a one-vs-one scheme. Corresponding estimators are: SVC for classification tasks. SVR for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The degree parameter is only used when kernel = \"poly\". The gamma parameter is always set to \"scale\" when kernel = \"poly\". The coef0 parameter is only used when kernel = \"rbf\". The random_state parameter is set equal to that of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") kernel: str, default=\"rbf\" Categorical([\"poly\", \"rbf\", \"sigmoid\"], name=\"kernel\") degree: int, default=3 Integer(2, 5, name=\"degree\"). gamma: str, default=\"scale\" Categorical([\"scale\", \"auto\"], name=\"gamma\") coef0: float, default=0 Real(-1.0, 1.0, name=\"coef0\"). shrinking: bool, default=True Categorical([True, False], name=\"shrinking\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"kSVM\", metric=\"r2\", est_params={\"kernel\": \"rbf\"})","title":"Kernel-SVM"},{"location":"API/models/ksvm/#kernel-svm-ksvm","text":"The implementation of the Kernel (non-linear) Support Vector Machine is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using a Linear Support Vector Machine or a Stochastic Gradient descent model instead. The multiclass support is handled according to a one-vs-one scheme. Corresponding estimators are: SVC for classification tasks. SVR for regression tasks. Read more in sklearn's documentation .","title":"Kernel-SVM (kSVM)"},{"location":"API/models/ksvm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The degree parameter is only used when kernel = \"poly\". The gamma parameter is always set to \"scale\" when kernel = \"poly\". The coef0 parameter is only used when kernel = \"rbf\". The random_state parameter is set equal to that of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") kernel: str, default=\"rbf\" Categorical([\"poly\", \"rbf\", \"sigmoid\"], name=\"kernel\") degree: int, default=3 Integer(2, 5, name=\"degree\"). gamma: str, default=\"scale\" Categorical([\"scale\", \"auto\"], name=\"gamma\") coef0: float, default=0 Real(-1.0, 1.0, name=\"coef0\"). shrinking: bool, default=True Categorical([True, False], name=\"shrinking\")","title":"Hyperparameters"},{"location":"API/models/ksvm/#attributes","text":"","title":"Attributes"},{"location":"API/models/ksvm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ksvm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ksvm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ksvm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ksvm/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"kSVM\", metric=\"r2\", est_params={\"kernel\": \"rbf\"})","title":"Example"},{"location":"API/models/lasso/","text":"Lasso Regression (Lasso) Linear least squares with l1 regularization. Corresponding estimators are: Lasso for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() or atom.lasso.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Lasso\")","title":"Lasso"},{"location":"API/models/lasso/#lasso-regression-lasso","text":"Linear least squares with l1 regularization. Corresponding estimators are: Lasso for regression tasks. Read more in sklearn's documentation .","title":"Lasso Regression (Lasso)"},{"location":"API/models/lasso/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\")","title":"Hyperparameters"},{"location":"API/models/lasso/#attributes","text":"","title":"Attributes"},{"location":"API/models/lasso/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lasso/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lasso/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lasso/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() or atom.lasso.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lasso/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Lasso\")","title":"Example"},{"location":"API/models/lda/","text":"Linear Discriminant Analysis (LDA) Linear Discriminant Analysis is a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: LinearDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The shrinkage parameter is not used when solver = \"svd\". Dimensions: solver: str, default=\"svd\" Categorical([\"svd\", \"lsqr\", \"eigen\"], name=\"solver\") shrinkage: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"shrinkage\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lda.plot_permutation_importance() or atom.lda.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LDA\")","title":"Linear Discriminant Analysis"},{"location":"API/models/lda/#linear-discriminant-analysis-lda","text":"Linear Discriminant Analysis is a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: LinearDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation .","title":"Linear Discriminant Analysis (LDA)"},{"location":"API/models/lda/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The shrinkage parameter is not used when solver = \"svd\". Dimensions: solver: str, default=\"svd\" Categorical([\"svd\", \"lsqr\", \"eigen\"], name=\"solver\") shrinkage: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"shrinkage\")","title":"Hyperparameters"},{"location":"API/models/lda/#attributes","text":"","title":"Attributes"},{"location":"API/models/lda/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lda/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lda/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lda/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lda.plot_permutation_importance() or atom.lda.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lda/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LDA\")","title":"Example"},{"location":"API/models/lgb/","text":"LightGBM (LGB) LightGBM is a gradient boosting model that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Capable of handling large-scale data. Corresponding estimators are: LGBMClassifier for classification tasks. LGBMRegressor for regression tasks. Read more in LightGBM's documentation . Note LightGBM allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=-1 Categorical([-1, \\*list(range(1, 10))], name=\"max_depth\") num_leaves: int, default=31 Integer(20, 40, name=\"num_leaves\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") min_child_samples: int, default=20 Integer(10, 30, name=\"min_child_samples\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"LGB\", metric=\"r2\", n_calls=50, bo_params={\"base_estimator\": \"ET\"})","title":"LightGBM"},{"location":"API/models/lgb/#lightgbm-lgb","text":"LightGBM is a gradient boosting model that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Capable of handling large-scale data. Corresponding estimators are: LGBMClassifier for classification tasks. LGBMRegressor for regression tasks. Read more in LightGBM's documentation . Note LightGBM allows early stopping to stop the training of unpromising models prematurely!","title":"LightGBM (LGB)"},{"location":"API/models/lgb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=-1 Categorical([-1, \\*list(range(1, 10))], name=\"max_depth\") num_leaves: int, default=31 Integer(20, 40, name=\"num_leaves\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") min_child_samples: int, default=20 Integer(10, 30, name=\"min_child_samples\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/lgb/#attributes","text":"","title":"Attributes"},{"location":"API/models/lgb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lgb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lgb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lgb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lgb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"LGB\", metric=\"r2\", n_calls=50, bo_params={\"base_estimator\": \"ET\"})","title":"Example"},{"location":"API/models/lr/","text":"Logistic regression (LR) Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. Corresponding estimators are: LogisticRegression for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is automatically set to \"l2\" when penalty = \"none\" and solver = \"liblinear\". The penalty parameter is automatically set to \"l2\" when penalty = \"l1\" and solver != \"liblinear\" or \"saga\". The penalty parameter is automatically set to \"l2\" when penalty = \"elasticnet\" and solver != \"saga\". The C parameter is not used when penalty = \"none\". The l1_ratio parameter is only used when penalty = \"elasticnet\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") solver: str, default=\"lbfgs\" Categorical([\"lbfgs\", \"newton-cg\", \"liblinear\", \"sag\", \"saga\"], name=\"solver\") max_iter: int, default=100 Integer(100, 1000, name=\"max_iter\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lr.plot_permutation_importance() or atom.lr.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LR\")","title":"Logistic Regression"},{"location":"API/models/lr/#logistic-regression-lr","text":"Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. Corresponding estimators are: LogisticRegression for classification tasks. Read more in sklearn's documentation .","title":"Logistic regression (LR)"},{"location":"API/models/lr/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is automatically set to \"l2\" when penalty = \"none\" and solver = \"liblinear\". The penalty parameter is automatically set to \"l2\" when penalty = \"l1\" and solver != \"liblinear\" or \"saga\". The penalty parameter is automatically set to \"l2\" when penalty = \"elasticnet\" and solver != \"saga\". The C parameter is not used when penalty = \"none\". The l1_ratio parameter is only used when penalty = \"elasticnet\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") solver: str, default=\"lbfgs\" Categorical([\"lbfgs\", \"newton-cg\", \"liblinear\", \"sag\", \"saga\"], name=\"solver\") max_iter: int, default=100 Integer(100, 1000, name=\"max_iter\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\")","title":"Hyperparameters"},{"location":"API/models/lr/#attributes","text":"","title":"Attributes"},{"location":"API/models/lr/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lr/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lr/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lr/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lr.plot_permutation_importance() or atom.lr.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lr/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LR\")","title":"Example"},{"location":"API/models/lsvm/","text":"Linear-SVM (lSVM) Similar to Kernel-SVM but with a linear kernel. Implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. The multiclass support is handled according to a one-vs-rest scheme. Corresponding estimators are: LinearSVC for classification tasks. LinearSVR for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is only used with LinearSVC. The penalty parameter is always set to \"l2\" when loss = \"hinge\". The dual parameter is automatically set to False when penalty = \"l1\" and loss = \"squared_hinge\". The random_state parameter is set equal to that of the training instance. Dimensions: loss: str classifier: default=\"squared_hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") penalty: str, default=\"l2\" Categorical([\"l1\", \"l2\"], name=\"penalty\"). Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"lSVM\", metric=\"accuracy\", n_calls=10)","title":"Linear-SVM"},{"location":"API/models/lsvm/#linear-svm-lsvm","text":"Similar to Kernel-SVM but with a linear kernel. Implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. The multiclass support is handled according to a one-vs-rest scheme. Corresponding estimators are: LinearSVC for classification tasks. LinearSVR for regression tasks. Read more in sklearn's documentation .","title":"Linear-SVM (lSVM)"},{"location":"API/models/lsvm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is only used with LinearSVC. The penalty parameter is always set to \"l2\" when loss = \"hinge\". The dual parameter is automatically set to False when penalty = \"l1\" and loss = \"squared_hinge\". The random_state parameter is set equal to that of the training instance. Dimensions: loss: str classifier: default=\"squared_hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") penalty: str, default=\"l2\" Categorical([\"l1\", \"l2\"], name=\"penalty\").","title":"Hyperparameters"},{"location":"API/models/lsvm/#attributes","text":"","title":"Attributes"},{"location":"API/models/lsvm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lsvm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lsvm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lsvm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lsvm/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"lSVM\", metric=\"accuracy\", n_calls=10)","title":"Example"},{"location":"API/models/mlp/","text":"Multi-layer Perceptron (MLP) Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function by training on a dataset. Given a set of features and a target, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Corresponding estimators are: MLPClassifier for classification tasks. MLPRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The MLP optimizes between one and three hidden layers with the BO. For more layers, use est_params . The learning_rate and power_t parameters are only used when solver = \"lbfgs\". The learning_rate_init parameter is only used when solver != \"lbfgs\". The random_state parameter is set equal to that of the trainer. Dimensions: hidden_layer_sizes: tuple, default=(100,) Integer(10, 100, name=\"hidden_layer_1\") Integer(0, 100, name=\"hidden_layer_2\") Integer(0, 100, name=\"hidden_layer_3\") activation: str, default=\"relu\" Categorical([\"identity\", \"logistic\", \"tanh\", \"relu\"], name=\"activation\") solver: str, default=\"adam\" Categorical([\"lbfgs\", \"sgd\", \"adam\"], name=\"solver\") alpha: float, default=1e-4 Real(1e-4, 0.1, \"log-uniform\", name=\"alpha\") batch_size: int, default=200 Integer(8, 250, name=\"batch_size\") learning_rate: str, default=\"constant\" Categorical([\"constant\", \"invscaling\", \"adaptive\"], name=\"learning_rate\"). learning_rate_init: float, default=1e-3 Real(1e-3, 0.1, \"log-uniform\", name=\"learning_rate_init\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\"). max_iter: int, default=200 Integer(50, 500, name=\"max_iter\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"MLP\", n_calls=20, est_params={\"solver\": \"sgd\", \"activation\": \"relu\"})","title":"Multi-layer Perceptron"},{"location":"API/models/mlp/#multi-layer-perceptron-mlp","text":"Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function by training on a dataset. Given a set of features and a target, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Corresponding estimators are: MLPClassifier for classification tasks. MLPRegressor for regression tasks. Read more in sklearn's documentation .","title":"Multi-layer Perceptron (MLP)"},{"location":"API/models/mlp/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The MLP optimizes between one and three hidden layers with the BO. For more layers, use est_params . The learning_rate and power_t parameters are only used when solver = \"lbfgs\". The learning_rate_init parameter is only used when solver != \"lbfgs\". The random_state parameter is set equal to that of the trainer. Dimensions: hidden_layer_sizes: tuple, default=(100,) Integer(10, 100, name=\"hidden_layer_1\") Integer(0, 100, name=\"hidden_layer_2\") Integer(0, 100, name=\"hidden_layer_3\") activation: str, default=\"relu\" Categorical([\"identity\", \"logistic\", \"tanh\", \"relu\"], name=\"activation\") solver: str, default=\"adam\" Categorical([\"lbfgs\", \"sgd\", \"adam\"], name=\"solver\") alpha: float, default=1e-4 Real(1e-4, 0.1, \"log-uniform\", name=\"alpha\") batch_size: int, default=200 Integer(8, 250, name=\"batch_size\") learning_rate: str, default=\"constant\" Categorical([\"constant\", \"invscaling\", \"adaptive\"], name=\"learning_rate\"). learning_rate_init: float, default=1e-3 Real(1e-3, 0.1, \"log-uniform\", name=\"learning_rate_init\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\"). max_iter: int, default=200 Integer(50, 500, name=\"max_iter\")","title":"Hyperparameters"},{"location":"API/models/mlp/#attributes","text":"","title":"Attributes"},{"location":"API/models/mlp/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/mlp/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/mlp/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/mlp/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/mlp/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"MLP\", n_calls=20, est_params={\"solver\": \"sgd\", \"activation\": \"relu\"})","title":"Example"},{"location":"API/models/mnb/","text":"Multinomial Naive Bayes (MNB) Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially distributed data, and is one of the two classic Naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). Corresponding estimators are: MultinomialNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"MNB\", metric=\"precision\")","title":"Multinomial Naive Bayes"},{"location":"API/models/mnb/#multinomial-naive-bayes-mnb","text":"Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially distributed data, and is one of the two classic Naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). Corresponding estimators are: MultinomialNB for classification tasks. Read more in sklearn's documentation .","title":"Multinomial Naive Bayes (MNB)"},{"location":"API/models/mnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/mnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/mnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/mnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/mnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/mnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/mnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"MNB\", metric=\"precision\")","title":"Example"},{"location":"API/models/ols/","text":"Ordinary Least Squares (OLS) Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w = (w1, \u2026, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Corresponding estimators are: LinearRegression for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. OLS has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's [SCORERS](https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"OLS\")","title":"Ordinary Least Squares"},{"location":"API/models/ols/#ordinary-least-squares-ols","text":"Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w = (w1, \u2026, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Corresponding estimators are: LinearRegression for regression tasks. Read more in sklearn's documentation .","title":"Ordinary Least Squares (OLS)"},{"location":"API/models/ols/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. OLS has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/ols/#attributes","text":"","title":"Attributes"},{"location":"API/models/ols/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ols/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ols/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ols/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's [SCORERS](https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ols/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"OLS\")","title":"Example"},{"location":"API/models/pa/","text":"Passive Aggressive (PA) The passive-aggressive algorithms are a family of algorithms for large-scale learning. They are similar to the Perceptron in that they do not require a learning rate. However, contrary to the Perceptron, they include a regularization parameter C. Corresponding estimators are: PassiveAggressiveClassifier for classification tasks. PassiveAggressiveRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") average: float, default=False Categorical([True, False], name=\"average\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"PA\", metric=\"f1\")","title":"Passive Aggressive"},{"location":"API/models/pa/#passive-aggressive-pa","text":"The passive-aggressive algorithms are a family of algorithms for large-scale learning. They are similar to the Perceptron in that they do not require a learning rate. However, contrary to the Perceptron, they include a regularization parameter C. Corresponding estimators are: PassiveAggressiveClassifier for classification tasks. PassiveAggressiveRegressor for regression tasks. Read more in sklearn's documentation .","title":"Passive Aggressive (PA)"},{"location":"API/models/pa/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") average: float, default=False Categorical([True, False], name=\"average\")","title":"Hyperparameters"},{"location":"API/models/pa/#attributes","text":"","title":"Attributes"},{"location":"API/models/pa/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/pa/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/pa/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/pa/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/pa/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"PA\", metric=\"f1\")","title":"Example"},{"location":"API/models/qda/","text":"Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: QuadraticDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: reg_param: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"reg_param\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.qda.plot_permutation_importance() or atom.qda.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"QDA\")","title":"Quadratic Discriminant Analysis"},{"location":"API/models/qda/#quadratic-discriminant-analysis-qda","text":"Linear Discriminant Analysis is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: QuadraticDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation .","title":"Quadratic Discriminant Analysis (QDA)"},{"location":"API/models/qda/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: reg_param: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"reg_param\")","title":"Hyperparameters"},{"location":"API/models/qda/#attributes","text":"","title":"Attributes"},{"location":"API/models/qda/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/qda/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/qda/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/qda/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.qda.plot_permutation_importance() or atom.qda.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/qda/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"QDA\")","title":"Example"},{"location":"API/models/rf/","text":"Random Forest (RF) Random forests are an ensemble learning method that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees\" habit of overfitting to their training set. Corresponding estimators are: RandomForestClassifier for classification tasks. RandomForestRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"RF\", metric=\"mae\", n_calls=20, n_initial_points=10)","title":"Random Forest"},{"location":"API/models/rf/#random-forest-rf","text":"Random forests are an ensemble learning method that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees\" habit of overfitting to their training set. Corresponding estimators are: RandomForestClassifier for classification tasks. RandomForestRegressor for regression tasks. Read more in sklearn's documentation .","title":"Random Forest (RF)"},{"location":"API/models/rf/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Hyperparameters"},{"location":"API/models/rf/#attributes","text":"","title":"Attributes"},{"location":"API/models/rf/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/rf/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs.","title":"Utility attributes"},{"location":"API/models/rf/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/rf/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/rf/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"RF\", metric=\"mae\", n_calls=20, n_initial_points=10)","title":"Example"},{"location":"API/models/ridge/","text":"Ridge Classification/Regression (Ridge) Linear least squares with l2 regularization. Corresponding estimators are: RidgeClassifier for classification tasks. Ridge for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") solver: str, default=\"auto\" Categorical([\"auto\", \"svd\", \"cholesky\", \"lsqr\", \"sparse_cg\", \"sag\", \"saga\"], name=\"solver\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() or atom.ridge.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Ridge\")","title":"Ridge"},{"location":"API/models/ridge/#ridge-classificationregression-ridge","text":"Linear least squares with l2 regularization. Corresponding estimators are: RidgeClassifier for classification tasks. Ridge for regression tasks. Read more in sklearn's documentation .","title":"Ridge Classification/Regression (Ridge)"},{"location":"API/models/ridge/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") solver: str, default=\"auto\" Categorical([\"auto\", \"svd\", \"cholesky\", \"lsqr\", \"sparse_cg\", \"sag\", \"saga\"], name=\"solver\")","title":"Hyperparameters"},{"location":"API/models/ridge/#attributes","text":"","title":"Attributes"},{"location":"API/models/ridge/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ridge/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ridge/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ridge/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() or atom.ridge.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ridge/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Ridge\")","title":"Example"},{"location":"API/models/rnn/","text":"Radius Nearest Neighbors (RNN) Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors are selected from within a given radius. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: RadiusNeighborsClassifier for classification tasks. RadiusNeighborsRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The outlier_label parameter is set by default to \"most_frequent\" to avoid errors when encountering outliers. The n_jobs parameter is set equal to that of the trainer. Dimensions: radius: float, default=mean(distances) Real(min(distances), max(distances), name=\"radius\") Since the optimal radius depends hugely on the data, ATOM's RNN implementation doesn't use sklearn's default radius of 1, but instead calculates the [minkowsky distance](https://en.wikipedia.org/wiki/Minkowski_distance) between 10% of random samples in the training set and uses the mean of those distances as default radius. The lower and upper bounds of the radius\" dimensions for the BO are given by the minimum and maximum value of the calculated distances. weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() or atom.rnn.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"RNN\", metric=\"precision\", est_params={\"radius\": 3.5})","title":"Radius Nearest Neighbors"},{"location":"API/models/rnn/#radius-nearest-neighbors-rnn","text":"Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors are selected from within a given radius. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: RadiusNeighborsClassifier for classification tasks. RadiusNeighborsRegressor for regression tasks. Read more in sklearn's documentation .","title":"Radius Nearest Neighbors (RNN)"},{"location":"API/models/rnn/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The outlier_label parameter is set by default to \"most_frequent\" to avoid errors when encountering outliers. The n_jobs parameter is set equal to that of the trainer. Dimensions: radius: float, default=mean(distances) Real(min(distances), max(distances), name=\"radius\") Since the optimal radius depends hugely on the data, ATOM's RNN implementation doesn't use sklearn's default radius of 1, but instead calculates the [minkowsky distance](https://en.wikipedia.org/wiki/Minkowski_distance) between 10% of random samples in the training set and uses the mean of those distances as default radius. The lower and upper bounds of the radius\" dimensions for the BO are given by the minimum and maximum value of the calculated distances. weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\")","title":"Hyperparameters"},{"location":"API/models/rnn/#attributes","text":"","title":"Attributes"},{"location":"API/models/rnn/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/rnn/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/rnn/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/rnn/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() or atom.rnn.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/rnn/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"RNN\", metric=\"precision\", est_params={\"radius\": 3.5})","title":"Example"},{"location":"API/models/sgd/","text":"Stochastic Gradient Descent (SGD) Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. Corresponding estimators are: SGDClassifier for classification tasks. SGDRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The l1_ratio parameter is only used when penalty = \"elasticnet\". The eta0 parameter is only used when learning_rate != \"optimal\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"log\", \"modified_huber\", \"squared_hinge\", \"perceptron\"], name=\"loss\") regressor: default=\"squared_loss\" Categorical([\"squared_loss\", \"huber\", \"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") alpha: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.15 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\"). epsilon: float, default=0.1 Real(1e-4, 1.0, \"log-uniform\", name=\"epsilon\") learning_rate: str, default=\"optimal\" Categorical([\"constant\", \"invscaling\", \"optimal\", \"adaptive\"], name = \"learning_rate\") eta0: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"eta0\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\") average: bool, default=False Categorical([True, False], name=\"average\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"SGD\", metric=\"recall\", bo_params={\"cv\": 3})","title":"Stochastic Gradient Descent"},{"location":"API/models/sgd/#stochastic-gradient-descent-sgd","text":"Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. Corresponding estimators are: SGDClassifier for classification tasks. SGDRegressor for regression tasks. Read more in sklearn's documentation .","title":"Stochastic Gradient Descent (SGD)"},{"location":"API/models/sgd/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The l1_ratio parameter is only used when penalty = \"elasticnet\". The eta0 parameter is only used when learning_rate != \"optimal\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"log\", \"modified_huber\", \"squared_hinge\", \"perceptron\"], name=\"loss\") regressor: default=\"squared_loss\" Categorical([\"squared_loss\", \"huber\", \"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") alpha: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.15 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\"). epsilon: float, default=0.1 Real(1e-4, 1.0, \"log-uniform\", name=\"epsilon\") learning_rate: str, default=\"optimal\" Categorical([\"constant\", \"invscaling\", \"optimal\", \"adaptive\"], name = \"learning_rate\") eta0: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"eta0\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\") average: bool, default=False Categorical([True, False], name=\"average\")","title":"Hyperparameters"},{"location":"API/models/sgd/#attributes","text":"","title":"Attributes"},{"location":"API/models/sgd/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/sgd/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/sgd/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/sgd/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/sgd/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"SGD\", metric=\"recall\", bo_params={\"cv\": 3})","title":"Example"},{"location":"API/models/tree/","text":"Decision Tree (Tree) A single decision tree classifier/regressor. Corresponding estimators are: DecisionTreeClassifier for classification tasks. DecisionTreeRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") splitter: str, default=\"best\" Categorical([\"best\", \"random\"], name=\"splitter\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0.0 Real(0, 0.035, name=\"ccp_alpha\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.tree.plot_permutation_importance() or atom.tree.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Tree\", metric=\"MSLE\")","title":"Decision Tree"},{"location":"API/models/tree/#decision-tree-tree","text":"A single decision tree classifier/regressor. Corresponding estimators are: DecisionTreeClassifier for classification tasks. DecisionTreeRegressor for regression tasks. Read more in sklearn's documentation .","title":"Decision Tree (Tree)"},{"location":"API/models/tree/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") splitter: str, default=\"best\" Categorical([\"best\", \"random\"], name=\"splitter\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0.0 Real(0, 0.035, name=\"ccp_alpha\")","title":"Hyperparameters"},{"location":"API/models/tree/#attributes","text":"","title":"Attributes"},{"location":"API/models/tree/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/tree/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/tree/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/tree/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.tree.plot_permutation_importance() or atom.tree.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/tree/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Tree\", metric=\"MSLE\")","title":"Example"},{"location":"API/models/xgb/","text":"XGBoost (XGB) XGBoost is an optimized distributed gradient boosting model designed to be highly efficient, flexible and portable. XGBoost provides a parallel tree boosting that solve many data science problems in a fast and accurate way. Corresponding estimators are: XGBClassifier for classification tasks. XGBRegressor for regression tasks. Read more in XGBoost's documentation . Note XGBoost allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=6 Integer(1, 10, name=\"max_depth\") gamma: float, default=0.0 Real(0, 1.0, name=\"gamma\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_tree: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_tree\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=1.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes You can use the same data attributes as the trainers to check the dataset that was used to fit a particular model. These can differ from each other if the model needs scaled features and the data wasn't already scaled. Note that, unlike with the training instances, these attributes not be updated (i.e. they have no @setter ). Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"XGB\", metric=\"me\", n_calls=25, bo_params={\"cv\": 1})","title":"XGBoost"},{"location":"API/models/xgb/#xgboost-xgb","text":"XGBoost is an optimized distributed gradient boosting model designed to be highly efficient, flexible and portable. XGBoost provides a parallel tree boosting that solve many data science problems in a fast and accurate way. Corresponding estimators are: XGBClassifier for classification tasks. XGBRegressor for regression tasks. Read more in XGBoost's documentation . Note XGBoost allows early stopping to stop the training of unpromising models prematurely!","title":"XGBoost (XGB)"},{"location":"API/models/xgb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=6 Integer(1, 10, name=\"max_depth\") gamma: float, default=0.0 Real(0, 1.0, name=\"gamma\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_tree: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_tree\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=1.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/xgb/#attributes","text":"","title":"Attributes"},{"location":"API/models/xgb/#data-attributes","text":"You can use the same data attributes as the trainers to check the dataset that was used to fit a particular model. These can differ from each other if the model needs scaled features and the data wasn't already scaled. Note that, unlike with the training instances, these attributes not be updated (i.e. they have no @setter ).","title":"Data attributes"},{"location":"API/models/xgb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/xgb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/xgb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/xgb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"XGB\", metric=\"me\", n_calls=25, bo_params={\"cv\": 1})","title":"Example"},{"location":"API/plots/bar_plot/","text":"bar_plot method bar_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature column is plotted. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's bar plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.bar_plot() atom.bar_plot(index=120)","title":"bar_plot"},{"location":"API/plots/bar_plot/#bar_plot","text":"method bar_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature column is plotted. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's bar plot .","title":"bar_plot"},{"location":"API/plots/bar_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.bar_plot() atom.bar_plot(index=120)","title":"Example"},{"location":"API/plots/beeswarm_plot/","text":"beeswarm_plot method beeswarm_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The beeswarm plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's beeswarm plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.beeswarm_plot()","title":"beeswarm_plot"},{"location":"API/plots/beeswarm_plot/#beeswarm_plot","text":"method beeswarm_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The beeswarm plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's beeswarm plot .","title":"beeswarm_plot"},{"location":"API/plots/beeswarm_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.beeswarm_plot()","title":"Example"},{"location":"API/plots/decision_plot/","text":"decision_plot method decision_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple predictions are plotted together, feature values will not be printed. Plotting too many predictions together will make the plot unintelligible. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, select all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's decision plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.decision_plot() atom.decision_plot(index=120)","title":"decision_plot"},{"location":"API/plots/decision_plot/#decision_plot","text":"method decision_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple predictions are plotted together, feature values will not be printed. Plotting too many predictions together will make the plot unintelligible. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, select all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's decision plot .","title":"decision_plot"},{"location":"API/plots/decision_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.decision_plot() atom.decision_plot(index=120)","title":"Example"},{"location":"API/plots/force_plot/","text":"force_plot method force_plot (models=None, index=None, target=1, title=None, figsize=(14, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use matplotlib=True (this option is only available when only a single sample is plotted). Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(14, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If matplotlib=False, the figure is saved as an html file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's force plot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"lr\") atom.force_plot(index=120, matplotlib=True, filename=\"force_plot\")","title":"force_plot"},{"location":"API/plots/force_plot/#force_plot","text":"method force_plot (models=None, index=None, target=1, title=None, figsize=(14, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use matplotlib=True (this option is only available when only a single sample is plotted). Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(14, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If matplotlib=False, the figure is saved as an html file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's force plot .","title":"force_plot"},{"location":"API/plots/force_plot/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"lr\") atom.force_plot(index=120, matplotlib=True, filename=\"force_plot\")","title":"Example"},{"location":"API/plots/heatmap_plot/","text":"heatmap_plot method heatmap_plot (models=None, index=None, show=None, target=1, title=None, figsize=(8, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original feature values but by their explanations. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The heatmap plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's heatmap plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.heatmap_plot()","title":"heatmap_plot"},{"location":"API/plots/heatmap_plot/#heatmap_plot","text":"method heatmap_plot (models=None, index=None, show=None, target=1, title=None, figsize=(8, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original feature values but by their explanations. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The heatmap plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's heatmap plot .","title":"heatmap_plot"},{"location":"API/plots/heatmap_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.heatmap_plot()","title":"Example"},{"location":"API/plots/plot_bo/","text":"plot_bo method plot_bo (models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True) [source] Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by bo_params={\"plot_bo\": True} while running the optimization. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline that used bayesian optimization are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 8)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LDA\", \"LGB\"], metric=\"f1\", n_calls=24, n_initial_points=10) atom.plot_bo()","title":"plot_bo"},{"location":"API/plots/plot_bo/#plot_bo","text":"method plot_bo (models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True) [source] Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by bo_params={\"plot_bo\": True} while running the optimization. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline that used bayesian optimization are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 8)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_bo"},{"location":"API/plots/plot_bo/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LDA\", \"LGB\"], metric=\"f1\", n_calls=24, n_initial_points=10) atom.plot_bo()","title":"Example"},{"location":"API/plots/plot_calibration/","text":"plot_calibration method plot_calibration (models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True) [source] Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approx. 80% actually belong to the positive class. Read more in sklearn's documentation . This figure shows two plots: the calibration curve, where the x-axis represents the average predicted probability in each bin and the y-axis is the fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin); and a distribution of all predicted probabilities of the classifier. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. n_bins: int, optional (default=10) Number of bins for the calibration calculation and the histogram. Minimum of 5 required. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"LR\", \"LGB\"], metric=\"average_precision\") atom.plot_calibration()","title":"plot_calibration"},{"location":"API/plots/plot_calibration/#plot_calibration","text":"method plot_calibration (models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True) [source] Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approx. 80% actually belong to the positive class. Read more in sklearn's documentation . This figure shows two plots: the calibration curve, where the x-axis represents the average predicted probability in each bin and the y-axis is the fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin); and a distribution of all predicted probabilities of the classifier. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. n_bins: int, optional (default=10) Number of bins for the calibration calculation and the histogram. Minimum of 5 required. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_calibration"},{"location":"API/plots/plot_calibration/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"LR\", \"LGB\"], metric=\"average_precision\") atom.plot_calibration()","title":"Example"},{"location":"API/plots/plot_components/","text":"plot_components method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. Only available if PCA was applied on the data. Parameters: show: int or None, optional (default=None) Number of components to show. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_components()","title":"plot_components"},{"location":"API/plots/plot_components/#plot_components","text":"method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. Only available if PCA was applied on the data. Parameters: show: int or None, optional (default=None) Number of components to show. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_components"},{"location":"API/plots/plot_components/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_components()","title":"Example"},{"location":"API/plots/plot_confusion_matrix/","text":"plot_confusion_matrix method plot_confusion_matrix (models=None, dataset=\"test\", normalize=False, title=None, figsize=None, filename=None, display=True) [source] Plot a model's confusion matrix. Only for classification tasks. For 1 model: plot the confusion matrix in a heatmap. For multiple models: compare TP, FP, FN and TN in a barplot (not implemented for multiclass classification tasks). Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the confusion matrix. Options are \"train\" or \"test\". normalize: bool, optional (default=False) Whether to normalize the matrix. Only for the heatmap plot. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to plot type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"Bag\"]) atom.Tree.plot_confusion_matrix(normalize=True) atom.plot_confusion_matrix()","title":"plot_confusion_matrix"},{"location":"API/plots/plot_confusion_matrix/#plot_confusion_matrix","text":"method plot_confusion_matrix (models=None, dataset=\"test\", normalize=False, title=None, figsize=None, filename=None, display=True) [source] Plot a model's confusion matrix. Only for classification tasks. For 1 model: plot the confusion matrix in a heatmap. For multiple models: compare TP, FP, FN and TN in a barplot (not implemented for multiclass classification tasks). Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the confusion matrix. Options are \"train\" or \"test\". normalize: bool, optional (default=False) Whether to normalize the matrix. Only for the heatmap plot. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to plot type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_confusion_matrix"},{"location":"API/plots/plot_confusion_matrix/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"Bag\"]) atom.Tree.plot_confusion_matrix(normalize=True) atom.plot_confusion_matrix()","title":"Example"},{"location":"API/plots/plot_correlation/","text":"plot_correlation method plot_correlation (columns=None, method=\"pearson\", title=None, figsize=(8, 7), filename=None, display=True) [source] Plot the data's correlation matrix. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. method: str, optional (default=\"pearson\") Method of correlation. Choose from \"pearson\", \"kendall\" or \"spearman\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 7)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_correlation()","title":"plot_correlation"},{"location":"API/plots/plot_correlation/#plot_correlation","text":"method plot_correlation (columns=None, method=\"pearson\", title=None, figsize=(8, 7), filename=None, display=True) [source] Plot the data's correlation matrix. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. method: str, optional (default=\"pearson\") Method of correlation. Choose from \"pearson\", \"kendall\" or \"spearman\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 7)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_correlation"},{"location":"API/plots/plot_correlation/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_correlation()","title":"Example"},{"location":"API/plots/plot_distribution/","text":"plot_distribution method plot_distribution (columns=0, distribution=None, show=None, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot column distributions. Additionally, it is possible to plot any of scipy.stats probability distributions fitted to the column. Missing values are ignored. Tip Use atom's distribution method to check which distribution fits the column best. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. It is only possible to plot one categorical column. If more than just the one categorical column is selected, all categorical columns are ignored. distribution: str, sequence or None, optional (default=None) Names of the scipy.stats distributions to fit to the column. If None, no distribution is fitted. Only for numerical columns. show: int or None, optional (default=None) Number of classes (ordered by number of occurrences) to show in the plot. None to show all. Only for categorical columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the plot's type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's histplot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_distribution(columns=[1, 2]) # With numerical columns atom.plot_distribution(columns=\"mean radius\", distribution=[\"norm\", \"triang\", \"pearson3\"]) # With fitted distributions atom.plot_distribution(columns=\"Location\", show=11) # With categorical columns","title":"plot_distribution"},{"location":"API/plots/plot_distribution/#plot_distribution","text":"method plot_distribution (columns=0, distribution=None, show=None, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot column distributions. Additionally, it is possible to plot any of scipy.stats probability distributions fitted to the column. Missing values are ignored. Tip Use atom's distribution method to check which distribution fits the column best. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. It is only possible to plot one categorical column. If more than just the one categorical column is selected, all categorical columns are ignored. distribution: str, sequence or None, optional (default=None) Names of the scipy.stats distributions to fit to the column. If None, no distribution is fitted. Only for numerical columns. show: int or None, optional (default=None) Number of classes (ordered by number of occurrences) to show in the plot. None to show all. Only for categorical columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the plot's type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's histplot .","title":"plot_distribution"},{"location":"API/plots/plot_distribution/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_distribution(columns=[1, 2]) # With numerical columns atom.plot_distribution(columns=\"mean radius\", distribution=[\"norm\", \"triang\", \"pearson3\"]) # With fitted distributions atom.plot_distribution(columns=\"Location\", show=11) # With categorical columns","title":"Example"},{"location":"API/plots/plot_errors/","text":"plot_errors method plot_errors (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect noise or heteroscedasticity along a range of the target domain. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the errors. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_errors()","title":"plot_errors"},{"location":"API/plots/plot_errors/#plot_errors","text":"method plot_errors (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect noise or heteroscedasticity along a range of the target domain. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the errors. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_errors"},{"location":"API/plots/plot_errors/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_errors()","title":"Example"},{"location":"API/plots/plot_evals/","text":"plot_evals method plot_evals (models=None, dataset=\"both\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation ( XGB , LGB , CatB ). The metric is provided by the estimator's package and is different for every model and every task. For this reason, the method only allows plotting one model at a time. Parameters: models: str, sequence or None, optional (default=None) Name of the model to plot. If None, all models in the pipeline are selected. Note that leaving the default option could raise an exception if there are multiple models in the pipeline. To avoid this, call the plot from a model, e.g. atom.lgb.plot_evals() . dataset: str, optional (default=\"both\") Data set on which to calculate the evaluation curves. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"Bag\", \"LGB\"]) atom.lgb.plot_evals()","title":"plot_evals"},{"location":"API/plots/plot_evals/#plot_evals","text":"method plot_evals (models=None, dataset=\"both\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation ( XGB , LGB , CatB ). The metric is provided by the estimator's package and is different for every model and every task. For this reason, the method only allows plotting one model at a time. Parameters: models: str, sequence or None, optional (default=None) Name of the model to plot. If None, all models in the pipeline are selected. Note that leaving the default option could raise an exception if there are multiple models in the pipeline. To avoid this, call the plot from a model, e.g. atom.lgb.plot_evals() . dataset: str, optional (default=\"both\") Data set on which to calculate the evaluation curves. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_evals"},{"location":"API/plots/plot_evals/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"Bag\", \"LGB\"]) atom.lgb.plot_evals()","title":"Example"},{"location":"API/plots/plot_feature_importance/","text":"plot_feature_importance method plot_feature_importance (models=None, show=None, title=None, figsize=None, filename=None, display=True) [source] Plot a tree-based model's feature importance. The importances are normalized in order to be able to compare them between models. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\"], metric=\"recall_weighted\") atom.RF.plot_feature_importance(show=11, filename=\"random_forest_importance\")","title":"plot_feature_importance"},{"location":"API/plots/plot_feature_importance/#plot_feature_importance","text":"method plot_feature_importance (models=None, show=None, title=None, figsize=None, filename=None, display=True) [source] Plot a tree-based model's feature importance. The importances are normalized in order to be able to compare them between models. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_feature_importance"},{"location":"API/plots/plot_feature_importance/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\"], metric=\"recall_weighted\") atom.RF.plot_feature_importance(show=11, filename=\"random_forest_importance\")","title":"Example"},{"location":"API/plots/plot_gains/","text":"plot_gains method plot_gains (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the cumulative gains curve. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the gains curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_gains(filename=\"cumulative_gains_curve\")","title":"plot_gains"},{"location":"API/plots/plot_gains/#plot_gains","text":"method plot_gains (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the cumulative gains curve. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the gains curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_gains"},{"location":"API/plots/plot_gains/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_gains(filename=\"cumulative_gains_curve\")","title":"Example"},{"location":"API/plots/plot_learning_curve/","text":"plot_learning_curve method plot_learning_curve (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using train sizing . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example import numpy as np from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.train_sizing([\"GNB\", \"LDA\"], metric=\"accuracy\", train_sizes=np.linspace(0.1, 1.0, 9), bagging=5) atom.plot_learning_curve()","title":"plot_learning_curve"},{"location":"API/plots/plot_learning_curve/#plot_learning_curve","text":"method plot_learning_curve (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using train sizing . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_learning_curve"},{"location":"API/plots/plot_learning_curve/#example","text":"import numpy as np from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.train_sizing([\"GNB\", \"LDA\"], metric=\"accuracy\", train_sizes=np.linspace(0.1, 1.0, 9), bagging=5) atom.plot_learning_curve()","title":"Example"},{"location":"API/plots/plot_lift/","text":"plot_lift method plot_lift (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the lift curve. Only for binary classification. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the lift curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_lift(filename=\"lift_curve\")","title":"plot_lift"},{"location":"API/plots/plot_lift/#plot_lift","text":"method plot_lift (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the lift curve. Only for binary classification. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the lift curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_lift"},{"location":"API/plots/plot_lift/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_lift(filename=\"lift_curve\")","title":"Example"},{"location":"API/plots/plot_partial_dependence/","text":"plot_partial_dependence method plot_partial_dependence (models=None, features=None, target=None, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the partial dependence of features. The partial dependence of a feature (or a set of features) corresponds to the average response of the model for each possible value of the feature. Two-way partial dependence plots are plotted as contour plots (only allowed for single model plots). The deciles of the feature values will be shown with tick marks on the x-axes for one-way plots, and on both axes for two-way plots. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. features: int, str, list, tuple or None, optional (default=None) Features or feature pairs (name or index) to get the partial dependence from. Maximum of 3 allowed. If None, it uses the top 3 features if the feature_importance attribute is defined else it uses the first 3 features in the dataset. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=6) atom.run([\"Tree\", \"Bag\"], metric=\"precision\") atom.plot_partial_dependence() atom.tree.plot_partial_dependence(features=[0, 1, (1, 3)])","title":"plot_partial_dependence"},{"location":"API/plots/plot_partial_dependence/#plot_partial_dependence","text":"method plot_partial_dependence (models=None, features=None, target=None, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the partial dependence of features. The partial dependence of a feature (or a set of features) corresponds to the average response of the model for each possible value of the feature. Two-way partial dependence plots are plotted as contour plots (only allowed for single model plots). The deciles of the feature values will be shown with tick marks on the x-axes for one-way plots, and on both axes for two-way plots. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. features: int, str, list, tuple or None, optional (default=None) Features or feature pairs (name or index) to get the partial dependence from. Maximum of 3 allowed. If None, it uses the top 3 features if the feature_importance attribute is defined else it uses the first 3 features in the dataset. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_partial_dependence"},{"location":"API/plots/plot_partial_dependence/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=6) atom.run([\"Tree\", \"Bag\"], metric=\"precision\") atom.plot_partial_dependence() atom.tree.plot_partial_dependence(features=[0, 1, (1, 3)])","title":"Example"},{"location":"API/plots/plot_pca/","text":"plot_pca method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. Only available if PCA was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_pca()","title":"plot_pca"},{"location":"API/plots/plot_pca/#plot_pca","text":"method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. Only available if PCA was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_pca"},{"location":"API/plots/plot_pca/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_pca()","title":"Example"},{"location":"API/plots/plot_permutation_importance/","text":"plot_permutation_importance method plot_permutation_importance (models=None, show=None, n_repeats=10, title=None, figsize=None, filename=None, display=True) [source] Plot the feature permutation importance of models. Calculating all permutations can be time-consuming, especially if n_repeats is high. They are stored under the attribute permutations . This means that if a plot is repeated for the same model with the same n_repeats , it will be considerably faster. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. n_repeats: int, optional (default=10) Number of times to permute each feature. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"LDA\"], metric=\"average_precision\") atom.lda.plot_permutation_importance(show=10, n_repeats=7)","title":"plot_permutation_importance"},{"location":"API/plots/plot_permutation_importance/#plot_permutation_importance","text":"method plot_permutation_importance (models=None, show=None, n_repeats=10, title=None, figsize=None, filename=None, display=True) [source] Plot the feature permutation importance of models. Calculating all permutations can be time-consuming, especially if n_repeats is high. They are stored under the attribute permutations . This means that if a plot is repeated for the same model with the same n_repeats , it will be considerably faster. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. n_repeats: int, optional (default=10) Number of times to permute each feature. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_permutation_importance"},{"location":"API/plots/plot_permutation_importance/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"LDA\"], metric=\"average_precision\") atom.lda.plot_permutation_importance(show=10, n_repeats=7)","title":"Example"},{"location":"API/plots/plot_pipeline/","text":"plot_pipeline method plot_pipeline (show_params=True, branch=None, title=None, figsize=None, filename=None, display=True) [source] Plot a diagram of every estimator in a branch. Parameters: show_params: bool, optional (default=True) Whether to show the parameters used for every estimator. branch: str or None, optional (default=None) Name of the branch to plot. If None, plot the current branch. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the length of the pipeline. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"median\", strat_cat=\"drop\", min_frac_rows=0.8) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.02) atom.prune(strategy=\"drop\", max_sigma=4, include_target=False) atom.feature_selection( strategy=\"PCA\", n_features=10, max_frac_repeated=1., max_correlation=0.7 ) atom.plot_pipeline()","title":"plot_pipeline"},{"location":"API/plots/plot_pipeline/#plot_pipeline","text":"method plot_pipeline (show_params=True, branch=None, title=None, figsize=None, filename=None, display=True) [source] Plot a diagram of every estimator in a branch. Parameters: show_params: bool, optional (default=True) Whether to show the parameters used for every estimator. branch: str or None, optional (default=None) Name of the branch to plot. If None, plot the current branch. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the length of the pipeline. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_pipeline"},{"location":"API/plots/plot_pipeline/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"median\", strat_cat=\"drop\", min_frac_rows=0.8) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.02) atom.prune(strategy=\"drop\", max_sigma=4, include_target=False) atom.feature_selection( strategy=\"PCA\", n_features=10, max_frac_repeated=1., max_correlation=0.7 ) atom.plot_pipeline()","title":"Example"},{"location":"API/plots/plot_prc/","text":"plot_prc method plot_prc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the precision-recall curve. The legend shows the average precision (AP) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"average_precision\") atom.plot_prc()","title":"plot_prc"},{"location":"API/plots/plot_prc/#plot_prc","text":"method plot_prc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the precision-recall curve. The legend shows the average precision (AP) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_prc"},{"location":"API/plots/plot_prc/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"average_precision\") atom.plot_prc()","title":"Example"},{"location":"API/plots/plot_probabilities/","text":"plot_probabilities method plot_probabilities (models=None, dataset=\"test\", target=1, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the probability distribution of the classes in the target column. Only for classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". target: int or str, optional (default=1) Probability of being that class in the target column as index or name. Only for multiclass classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y=\"RainTomorrow\") atom.run(\"rf\") atom.plot_probabilities()","title":"plot_probabilities"},{"location":"API/plots/plot_probabilities/#plot_probabilities","text":"method plot_probabilities (models=None, dataset=\"test\", target=1, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the probability distribution of the classes in the target column. Only for classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". target: int or str, optional (default=1) Probability of being that class in the target column as index or name. Only for multiclass classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_probabilities"},{"location":"API/plots/plot_probabilities/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y=\"RainTomorrow\") atom.run(\"rf\") atom.plot_probabilities()","title":"Example"},{"location":"API/plots/plot_qq/","text":"plot_qq method plot_qq (columns=0, distribution=\"norm\", title=None, figsize=None, filename=None, display=True) [source] Plot a quantile-quantile plot. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. Selected categorical columns are ignored. distribution: str, sequence or None, optional (default=\"norm\") Name of the scipy.stats distribution to fit to the columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_qq(columns=[0, 1], distribution=\"triang\")","title":"plot_qq"},{"location":"API/plots/plot_qq/#plot_qq","text":"method plot_qq (columns=0, distribution=\"norm\", title=None, figsize=None, filename=None, display=True) [source] Plot a quantile-quantile plot. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. Selected categorical columns are ignored. distribution: str, sequence or None, optional (default=\"norm\") Name of the scipy.stats distribution to fit to the columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_qq"},{"location":"API/plots/plot_qq/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_qq(columns=[0, 1], distribution=\"triang\")","title":"Example"},{"location":"API/plots/plot_residuals/","text":"plot_residuals method plot_residuals (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the horizontal axis. The gray, intersected line shows the identity line. This plot can be useful to analyze the variance of the error of the regressor. If the points are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_residuals()","title":"plot_residuals"},{"location":"API/plots/plot_residuals/#plot_residuals","text":"method plot_residuals (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the horizontal axis. The gray, intersected line shows the identity line. This plot can be useful to analyze the variance of the error of the regressor. If the points are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_residuals"},{"location":"API/plots/plot_residuals/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_residuals()","title":"Example"},{"location":"API/plots/plot_results/","text":"plot_results method plot_results (models=None, metric=0, title=None, figsize=None, filename=None, display=True) [source] Plot of the model results after the evaluation. If all models applied bagging, the plot is a boxplot. If not, the plot is a barplot. Models are ordered based on their score from the top down. The score is either the mean_bagging or metric_test attribute of the model, selected in that order. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size the to number of models. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=5) atom.plot_results() # With bagging... # And without bagging... atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=0) atom.plot_results()","title":"plot_results"},{"location":"API/plots/plot_results/#plot_results","text":"method plot_results (models=None, metric=0, title=None, figsize=None, filename=None, display=True) [source] Plot of the model results after the evaluation. If all models applied bagging, the plot is a boxplot. If not, the plot is a barplot. Models are ordered based on their score from the top down. The score is either the mean_bagging or metric_test attribute of the model, selected in that order. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size the to number of models. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_results"},{"location":"API/plots/plot_results/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=5) atom.plot_results() # With bagging... # And without bagging... atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=0) atom.plot_results()","title":"Example"},{"location":"API/plots/plot_rfecv/","text":"plot_rfecv method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every subset of the dataset. Only available if RFECV was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"RFECV\", solver=\"LGB\", scoring=\"precision\") atom.plot_rfecv()","title":"plot_rfecv"},{"location":"API/plots/plot_rfecv/#plot_rfecv","text":"method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every subset of the dataset. Only available if RFECV was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_rfecv"},{"location":"API/plots/plot_rfecv/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"RFECV\", solver=\"LGB\", scoring=\"precision\") atom.plot_rfecv()","title":"Example"},{"location":"API/plots/plot_roc/","text":"plot_roc method plot_roc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the Receiver Operating Characteristics curve. The legend shows the Area Under the ROC Curve (AUC) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_roc(filename=\"roc_curve\")","title":"plot_roc"},{"location":"API/plots/plot_roc/#plot_roc","text":"method plot_roc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the Receiver Operating Characteristics curve. The legend shows the Area Under the ROC Curve (AUC) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_roc"},{"location":"API/plots/plot_roc/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_roc(filename=\"roc_curve\")","title":"Example"},{"location":"API/plots/plot_scatter_matrix/","text":"plot_scatter_matrix method plot_scatter_matrix (columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs) [source] Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's pairplot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_scatter_matrix(columns=slice(0, 5))","title":"plot_scatter_matrix"},{"location":"API/plots/plot_scatter_matrix/#plot_scatter_matrix","text":"method plot_scatter_matrix (columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs) [source] Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's pairplot .","title":"plot_scatter_matrix"},{"location":"API/plots/plot_scatter_matrix/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_scatter_matrix(columns=slice(0, 5))","title":"Example"},{"location":"API/plots/plot_successive_halving/","text":"plot_successive_halving method plot_successive_halving (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using successive halving . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.successive_halving([\"bag\", \"adab\", \"et\", \"lgb\"], metric=\"accuracy\", bagging=5) atom.plot_successive_halving(filename=\"plot_successive_halving\")","title":"plot_successive_halving"},{"location":"API/plots/plot_successive_halving/#plot_successive_halving","text":"method plot_successive_halving (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using successive halving . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_successive_halving"},{"location":"API/plots/plot_successive_halving/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.successive_halving([\"bag\", \"adab\", \"et\", \"lgb\"], metric=\"accuracy\", bagging=5) atom.plot_successive_halving(filename=\"plot_successive_halving\")","title":"Example"},{"location":"API/plots/plot_threshold/","text":"plot_threshold method plot_threshold (models=None, metric=None, dataset=\"test\", steps=100, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot metric performances against threshold values. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: str, callable, list, tuple or None, optional (default=None) Metric(s) to plot. These can be one of sklearn's predefined scorers, a metric function or a sklearn scorer object (see the user guide ). If None, the metric used to run the pipeline is used. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". steps: int, optional (default=100) Number of thresholds measured. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier from sklearn.metrics import recall_score atom = ATOMClassifier(X, y) atom.run(\"LGB\") atom.plot_threshold(metric=[\"accuracy\", \"f1\", recall_score])","title":"plot_threshold"},{"location":"API/plots/plot_threshold/#plot_threshold","text":"method plot_threshold (models=None, metric=None, dataset=\"test\", steps=100, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot metric performances against threshold values. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: str, callable, list, tuple or None, optional (default=None) Metric(s) to plot. These can be one of sklearn's predefined scorers, a metric function or a sklearn scorer object (see the user guide ). If None, the metric used to run the pipeline is used. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". steps: int, optional (default=100) Number of thresholds measured. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_threshold"},{"location":"API/plots/plot_threshold/#example","text":"from atom import ATOMClassifier from sklearn.metrics import recall_score atom = ATOMClassifier(X, y) atom.run(\"LGB\") atom.plot_threshold(metric=[\"accuracy\", \"f1\", recall_score])","title":"Example"},{"location":"API/plots/scatter_plot/","text":"scatter_plot method scatter_plot (models=None, index=None, feature=0, target=1, title=None, figsize=(10, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and the SHAP value of the same feature on the y-axis. This shows how the model depends on the given feature, and is like a richer extension of the classical partial dependence plots. Vertical dispersion of the data points represents interaction effects. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The scatter plot does not support plotting a single sample. feature: int or str, optional (default=0) Index or name of the feature to plot. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's scatter plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.scatter_plot(feature=\"bmi\")","title":"scatter_plot"},{"location":"API/plots/scatter_plot/#scatter_plot","text":"method scatter_plot (models=None, index=None, feature=0, target=1, title=None, figsize=(10, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and the SHAP value of the same feature on the y-axis. This shows how the model depends on the given feature, and is like a richer extension of the classical partial dependence plots. Vertical dispersion of the data points represents interaction effects. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The scatter plot does not support plotting a single sample. feature: int or str, optional (default=0) Index or name of the feature to plot. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's scatter plot .","title":"scatter_plot"},{"location":"API/plots/scatter_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.scatter_plot(feature=\"bmi\")","title":"Example"},{"location":"API/plots/waterfall_plot/","text":"waterfall_plot method waterfall_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True) [source] Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model\u2019s output. The waterfall plot is designed to visually display how the SHAP values (evidence) of each feature move the model output from our prior expectation under the background data distribution, to the final model prediction given the evidence of all the features. Features are sorted by the magnitude of their SHAP values with the smallest magnitude features grouped together at the bottom of the plot when the number of features in the models exceeds the show parameter. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int or None, optional (default=None) Index of the row in the dataset to plot. If None, it selects the first row in the test set. The waterfall plot does not support plotting multiple samples. show: int or None, optional (default=None) Number of features to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"Tree\") atom.tree.waterfall_plot(index=120)","title":"waterfall_plot"},{"location":"API/plots/waterfall_plot/#waterfall_plot","text":"method waterfall_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True) [source] Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model\u2019s output. The waterfall plot is designed to visually display how the SHAP values (evidence) of each feature move the model output from our prior expectation under the background data distribution, to the final model prediction given the evidence of all the features. Features are sorted by the magnitude of their SHAP values with the smallest magnitude features grouped together at the bottom of the plot when the number of features in the models exceeds the show parameter. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int or None, optional (default=None) Index of the row in the dataset to plot. If None, it selects the first row in the test set. The waterfall plot does not support plotting multiple samples. show: int or None, optional (default=None) Number of features to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"waterfall_plot"},{"location":"API/plots/waterfall_plot/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"Tree\") atom.tree.waterfall_plot(index=120)","title":"Example"},{"location":"API/predicting/decision_function/","text":"decision_function method decision_function (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a decision_function method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted confidence scores of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"kSVM\", metric=\"accuracy\") # Predict confidence scores on new data predictions = atom.ksvm.decision_function(X_new)","title":"decision_function"},{"location":"API/predicting/decision_function/#decision_function","text":"method decision_function (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a decision_function method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted confidence scores of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"decision_function"},{"location":"API/predicting/decision_function/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"kSVM\", metric=\"accuracy\") # Predict confidence scores on new data predictions = atom.ksvm.decision_function(X_new)","title":"Example"},{"location":"API/predicting/predict/","text":"predict method predict (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted targets with shape=(n_samples,). Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict(X_new)","title":"predict"},{"location":"API/predicting/predict/#predict","text":"method predict (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted targets with shape=(n_samples,).","title":"predict"},{"location":"API/predicting/predict/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict(X_new)","title":"Example"},{"location":"API/predicting/predict_log_proba/","text":"predict_log_proba method predict_log_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_log_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class log-probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_log_proba(X_new)","title":"predict_log_proba"},{"location":"API/predicting/predict_log_proba/#predict_log_proba","text":"method predict_log_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_log_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class log-probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"predict_log_proba"},{"location":"API/predicting/predict_log_proba/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_log_proba(X_new)","title":"Example"},{"location":"API/predicting/predict_proba/","text":"predict_proba method predict_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_proba(X_new)","title":"predict_proba"},{"location":"API/predicting/predict_proba/#predict_proba","text":"method predict_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"predict_proba"},{"location":"API/predicting/predict_proba/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_proba(X_new)","title":"Example"},{"location":"API/predicting/score/","text":"score method score (X, y, sample_weights=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a score method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). sample_weights: sequence or None, optional (default=None) Sample weights corresponding to y. pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: score: np.float64 Mean accuracy or r2 (depending on the task) of self.predict(X) with respect to y. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"MNB\", \"KNN\", \"kSVM\"], metric=\"precision\") # Get the mean accuracy on new data predictions = atom.mnb.score(X_new, y_new)","title":"score"},{"location":"API/predicting/score/#score","text":"method score (X, y, sample_weights=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a score method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). sample_weights: sequence or None, optional (default=None) Sample weights corresponding to y. pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: score: np.float64 Mean accuracy or r2 (depending on the task) of self.predict(X) with respect to y.","title":"score"},{"location":"API/predicting/score/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"MNB\", \"KNN\", \"kSVM\"], metric=\"precision\") # Get the mean accuracy on new data predictions = atom.mnb.score(X_new, y_new)","title":"Example"},{"location":"API/predicting/transform/","text":"transform method transform (X, y=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the pipeline parameter to customize this behaviour. This method can only be called from atom, not from the models. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Features to transform, with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored in the transformers. If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean() atom.impute(strat_num=\"knn\", strat_cat=\"drop\") atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2) # Transform new data through all data cleaning steps X_transformed = atom.transform(X_new)","title":"transform"},{"location":"API/predicting/transform/#transform","text":"method transform (X, y=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the pipeline parameter to customize this behaviour. This method can only be called from atom, not from the models. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Features to transform, with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored in the transformers. If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"transform"},{"location":"API/predicting/transform/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean() atom.impute(strat_num=\"knn\", strat_cat=\"drop\") atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2) # Transform new data through all data cleaning steps X_transformed = atom.transform(X_new)","title":"Example"},{"location":"API/training/directclassifier/","text":"DirectClassifier class atom.training. DirectClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import DirectClassifier # Run the pipeline trainer = DirectClassifier([\"Tree\", \"RF\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.scoring(\"auc\") trainer.Tree.plot_bo()","title":"DirectClassifier"},{"location":"API/training/directclassifier/#directclassifier","text":"class atom.training. DirectClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"DirectClassifier"},{"location":"API/training/directclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/directclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/directclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/directclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/directclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/directclassifier/#example","text":"from atom.training import DirectClassifier # Run the pipeline trainer = DirectClassifier([\"Tree\", \"RF\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.scoring(\"auc\") trainer.Tree.plot_bo()","title":"Example"},{"location":"API/training/directregressor/","text":"DirectRegressor class atom.training. DirectRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import DirectRegressor # Run the pipeline trainer = DirectRegressor([\"OLS\", \"BR\"], n_calls=5, n_initial_points=3, bagging=5) trainer.run(train, test) # Analyze the results trainer.scoring(\"mse\") trainer.plot_results()","title":"DirectRegressor"},{"location":"API/training/directregressor/#directregressor","text":"class atom.training. DirectRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"DirectRegressor"},{"location":"API/training/directregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/directregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/directregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/directregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/directregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/directregressor/#example","text":"from atom.training import DirectRegressor # Run the pipeline trainer = DirectRegressor([\"OLS\", \"BR\"], n_calls=5, n_initial_points=3, bagging=5) trainer.run(train, test) # Analyze the results trainer.scoring(\"mse\") trainer.plot_results()","title":"Example"},{"location":"API/training/successivehalvingclassifier/","text":"SuccessiveHalvingClassifier class atom.training. SuccessiveHalvingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators is used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import SuccessiveHalvingClassifier # Run the pipeline trainer = SuccessiveHalvingClassifier([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"SuccessiveHalvingClassifier"},{"location":"API/training/successivehalvingclassifier/#successivehalvingclassifier","text":"class atom.training. SuccessiveHalvingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"SuccessiveHalvingClassifier"},{"location":"API/training/successivehalvingclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/successivehalvingclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/successivehalvingclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/successivehalvingclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/successivehalvingclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators is used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/successivehalvingclassifier/#example","text":"from atom.training import SuccessiveHalvingClassifier # Run the pipeline trainer = SuccessiveHalvingClassifier([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"Example"},{"location":"API/training/successivehalvingregressor/","text":"SuccessiveHalvingRegressor class atom.training. SuccessiveHalvingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import SuccessiveHalvingRegressor # Run the pipeline trainer = SuccessiveHalvingRegressor([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"SuccessiveHalvingRegressor"},{"location":"API/training/successivehalvingregressor/#successivehalvingregressor","text":"class atom.training. SuccessiveHalvingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"SuccessiveHalvingRegressor"},{"location":"API/training/successivehalvingregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/successivehalvingregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/successivehalvingregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/successivehalvingregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/successivehalvingregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/successivehalvingregressor/#example","text":"from atom.training import SuccessiveHalvingRegressor # Run the pipeline trainer = SuccessiveHalvingRegressor([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"Example"},{"location":"API/training/trainsizingclassifier/","text":"TrainSizingClassifier class atom.training. TrainSizingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import TrainSizingClassifier # Run the pipeline trainer = TrainSizingClassifier(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"TrainSizingClassifier"},{"location":"API/training/trainsizingclassifier/#trainsizingclassifier","text":"class atom.training. TrainSizingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"TrainSizingClassifier"},{"location":"API/training/trainsizingclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/trainsizingclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/trainsizingclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/trainsizingclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/trainsizingclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/trainsizingclassifier/#example","text":"from atom.training import TrainSizingClassifier # Run the pipeline trainer = TrainSizingClassifier(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"Example"},{"location":"API/training/trainsizingregressor/","text":"TrainSizingRegressor class atom.training. TrainSizingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import TrainSizingRegressor # Run the pipeline trainer = TrainSizingRegressor(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"TrainSizingRegressor"},{"location":"API/training/trainsizingregressor/#trainsizingregressor","text":"class atom.training. TrainSizingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"TrainSizingRegressor"},{"location":"API/training/trainsizingregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/trainsizingregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/trainsizingregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/trainsizingregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/trainsizingregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/trainsizingregressor/#example","text":"from atom.training import TrainSizingRegressor # Run the pipeline trainer = TrainSizingRegressor(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"Example"}]} \ No newline at end of file +{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Automated Tool for Optimized Modelling A Python package for fast exploration of machine learning pipelines During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. Example steps taken by ATOM's pipeline: Data Cleaning Handle missing values Encode categorical features Detect and remove outliers Balance the training set Feature engineering Create new non-linear features Remove multi-collinear features Remove features with too low variance Select the most promising features Train and validate multiple models Select hyperparameters using a Bayesian Optimization approach Train and test the models on the provided data Assess the robustness of the output using a bagging algorithm Analyze the results Get the model scores on various metrics Make plots to compare the model performances Figure 1. Diagram of the possible steps taken by ATOM. Release history Version 4.4.0 The drop method now allows the user to drop columns as part of the pipeline. It is now possible to add data transformations as function to the pipeline through the apply method. Added the status method to save an overview of atom's branches and models to the logger. Improved the output messages for the Imputer class. The dataset's columns can now be called directly from atom. The distribution and plot_distribution methods now ignore missing values instead of raising an exception. Fixed a bug where transformations failed when columns were added after initializing the pipeline. Fixed a bug where the Cleaner class didn't drop columns with only missing values for minimum_cardinality=True . Fixed a bug where the winning model wasn't displayed correctly. Refactored the way transformers are added or removed from predicting methods. Improved documentation. Version 4.3.0 Possibility to add custom transformers to the pipeline. The export_pipeline utility method exports atom's current pipeline to a sklearn object. Use AutoML to automate the search for an optimized pipeline. New magic methods makes atom behave similarly to sklearn's Pipeline . All training approaches can now be combined in the same atom instance. New plot_scatter_matrix , plot_distribution and plot_qq for data inspection. Complete rework of all the shap plots to be consistent with their new API. Improvements for the Scaler and Pruner classes. The acronym for custom models now defaults to the capital letters in the class' __name__. Possibility to apply transformations on only a subset of the columns. Plots and methods now accept winner as model name. Fixed a bug where custom metrics didn't show the correct name. Fixed a bug where timers were not displayed correctly. Further compatibility with deep learning datasets. Large refactoring for performance optimization. Cleaner output of messages to the logger. Plots no longer show a default title. Added the AutoML example notebook. Minor bug fixes. Version 4.2.1 Bug fix where there was memory leakage in successive halving and train sizing pipelines. The XGBoost , LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models , e.g. pip install -U atom-ml[models] . Improved documentation. Version 4.2.0 Possibility to add custom models to the pipeline using ATOMModel . Compatibility with deep learning models. New branch system for different data pipelines. Read more in the user guide . Use the canvas contextmanager to draw multiple plots in one figure. New voting and stacking ensemble techniques. New get_class_weight utility method. New Sequential Feature Selection strategy for the FeatureSelector . Added the sample_weight parameter to the score method. New ways to initialize the data in the training instances. The n_rows parameter in ATOMLoader is deprecated in favour of the new data input formats. The test_size parameter now also allows integer values. Renamed categories to classes to be consistent with sklearn's API. The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset. Possibility to add custom parameters to an estimator's fit method through est_params . Successive halving and train sizing now both allow subsequent runs from atom without losing previous information. Bug fix where ATOMLoader wouldn't encode the target column during transformation. Added the Deep learning , Ensembles and Utilities example notebooks. Compatibility with python 3.9 . Version 4.1.0 Added the est_params parameter to customize the parameters passed to every model's estimator. Following skopt's API, the n_random_starts parameter is deprecated in favour of n_initial_points . The Balancer class now allows you to use any of the strategies from imblearn . New utility attributes to inspect the dataset. Four new models: CatNB , CNB , ARD and RNN . Added the models section to the documentation. Small changes in log outputs. Bug fixes and performance improvements. Version 4.0.1 Bug fix where the DFS strategy in FeatureGenerator was not deterministic for a fixed random state. Bug fix where subsequent runs with the same metric failed. Added the license file to the package's installer. Typo fixes in documentation. Version 4.0.0 Bayesian optimization package changed from GpyOpt to skopt . Complete revision of the model's hyperparameters. Four SHAP plots can now be called directly from an ATOM pipeline. Two new plots for regression tasks. New plot_pipeline and pipeline attribute to access all transformers. Possibility to determine transformer parameters per method. New calibration method and plot . Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs). Implementation of multi-metric runs. Possibility to choose which metric to plot. Early stopping for models that allow in-training evaluation. Added the ATOMLoader function to load saved atom instances and directly apply all data transformations. The \"remove\" strategy in the data cleaning parameters is deprecated in favour of \"drop\". Implemented the DFS strategy in FeatureGenerator . All training classes now inherit from BaseEstimator. Added multiple new example notebooks. Tests coverage up to 100%. Completely new documentation page. Bug fixes and performance improvements. Content Getting started User guide API ATOM ATOMClassifier ATOMRegressor ATOMLoader ATOMModel Data cleaning Scaler Cleaner Imputer Encoder Pruner Balancer Feature engineering FeatureGenerator FeatureSelector Training Direct DirectClassifier DirectRegressor SuccessiveHalving SuccessiveHalvingClassifier SuccessiveHalvingClassifier TrainSizing TrainSizingClassifier TrainSizingRegressor Models Gaussian Process Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli Naive Bayes Categorical Naive Bayes Complement Naive Bayes Ordinary Least Squares Ridge Lasso Elastic Net Bayesian Ridge Automated Relevance Determination Logistic Regression Linear Discriminant Analysis Quadratic Discriminant Analysis K-Nearest Neighbors Radius Nearest Neighbors Decision Tree Bagging Extra-Trees Random Forest AdaBoost Gradient Boosting Machine XGBoost LightGBM CatBoost Linear-SVM Kernel-SVM Passive Aggressive Stochastic Gradient Descent Multi-layer Perceptron Predicting transform predict predict_proba predict_log_proba decision_function score Plots plot_correlation plot_scatter_matrix plot_distribution plot_qq plot_pipeline plot_pca plot_components plot_rfecv plot_successive_halving plot_learning_curve plot_results plot_bo plot_evals plot_roc plot_prc plot_permutation_importance plot_feature_importance plot_partial_dependence plot_errors plot_residuals plot_confusion_matrix plot_threshold plot_probabilities plot_calibration plot_gains plot_lift bar_plot beeswarm_plot decision_plot force_plot heatmap_plot scatter_plot waterfall_plot Examples AutoML Binary classification Calibration Deep learning Early stopping Ensembles Feature engineering Imbalanced datasets Multiclass classification Multi-metric runs Regression Successive halving Train sizing Utilities FAQ Dependencies License","title":"Home"},{"location":"#automated-tool-for-optimized-modelling","text":"","title":"Automated Tool for Optimized Modelling"},{"location":"#a-python-package-for-fast-exploration-of-machine-learning-pipelines","text":"During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. Example steps taken by ATOM's pipeline: Data Cleaning Handle missing values Encode categorical features Detect and remove outliers Balance the training set Feature engineering Create new non-linear features Remove multi-collinear features Remove features with too low variance Select the most promising features Train and validate multiple models Select hyperparameters using a Bayesian Optimization approach Train and test the models on the provided data Assess the robustness of the output using a bagging algorithm Analyze the results Get the model scores on various metrics Make plots to compare the model performances Figure 1. Diagram of the possible steps taken by ATOM.","title":"A Python package for fast exploration of machine learning pipelines"},{"location":"#release-history","text":"","title":"Release history"},{"location":"#version-440","text":"The drop method now allows the user to drop columns as part of the pipeline. It is now possible to add data transformations as function to the pipeline through the apply method. Added the status method to save an overview of atom's branches and models to the logger. Improved the output messages for the Imputer class. The dataset's columns can now be called directly from atom. The distribution and plot_distribution methods now ignore missing values instead of raising an exception. Fixed a bug where transformations failed when columns were added after initializing the pipeline. Fixed a bug where the Cleaner class didn't drop columns with only missing values for minimum_cardinality=True . Fixed a bug where the winning model wasn't displayed correctly. Refactored the way transformers are added or removed from predicting methods. Improved documentation.","title":"Version 4.4.0"},{"location":"#version-430","text":"Possibility to add custom transformers to the pipeline. The export_pipeline utility method exports atom's current pipeline to a sklearn object. Use AutoML to automate the search for an optimized pipeline. New magic methods makes atom behave similarly to sklearn's Pipeline . All training approaches can now be combined in the same atom instance. New plot_scatter_matrix , plot_distribution and plot_qq for data inspection. Complete rework of all the shap plots to be consistent with their new API. Improvements for the Scaler and Pruner classes. The acronym for custom models now defaults to the capital letters in the class' __name__. Possibility to apply transformations on only a subset of the columns. Plots and methods now accept winner as model name. Fixed a bug where custom metrics didn't show the correct name. Fixed a bug where timers were not displayed correctly. Further compatibility with deep learning datasets. Large refactoring for performance optimization. Cleaner output of messages to the logger. Plots no longer show a default title. Added the AutoML example notebook. Minor bug fixes.","title":"Version 4.3.0"},{"location":"#version-421","text":"Bug fix where there was memory leakage in successive halving and train sizing pipelines. The XGBoost , LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models , e.g. pip install -U atom-ml[models] . Improved documentation.","title":"Version 4.2.1"},{"location":"#version-420","text":"Possibility to add custom models to the pipeline using ATOMModel . Compatibility with deep learning models. New branch system for different data pipelines. Read more in the user guide . Use the canvas contextmanager to draw multiple plots in one figure. New voting and stacking ensemble techniques. New get_class_weight utility method. New Sequential Feature Selection strategy for the FeatureSelector . Added the sample_weight parameter to the score method. New ways to initialize the data in the training instances. The n_rows parameter in ATOMLoader is deprecated in favour of the new data input formats. The test_size parameter now also allows integer values. Renamed categories to classes to be consistent with sklearn's API. The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset. Possibility to add custom parameters to an estimator's fit method through est_params . Successive halving and train sizing now both allow subsequent runs from atom without losing previous information. Bug fix where ATOMLoader wouldn't encode the target column during transformation. Added the Deep learning , Ensembles and Utilities example notebooks. Compatibility with python 3.9 .","title":"Version 4.2.0"},{"location":"#version-410","text":"Added the est_params parameter to customize the parameters passed to every model's estimator. Following skopt's API, the n_random_starts parameter is deprecated in favour of n_initial_points . The Balancer class now allows you to use any of the strategies from imblearn . New utility attributes to inspect the dataset. Four new models: CatNB , CNB , ARD and RNN . Added the models section to the documentation. Small changes in log outputs. Bug fixes and performance improvements.","title":"Version 4.1.0"},{"location":"#version-401","text":"Bug fix where the DFS strategy in FeatureGenerator was not deterministic for a fixed random state. Bug fix where subsequent runs with the same metric failed. Added the license file to the package's installer. Typo fixes in documentation.","title":"Version 4.0.1"},{"location":"#version-400","text":"Bayesian optimization package changed from GpyOpt to skopt . Complete revision of the model's hyperparameters. Four SHAP plots can now be called directly from an ATOM pipeline. Two new plots for regression tasks. New plot_pipeline and pipeline attribute to access all transformers. Possibility to determine transformer parameters per method. New calibration method and plot . Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs). Implementation of multi-metric runs. Possibility to choose which metric to plot. Early stopping for models that allow in-training evaluation. Added the ATOMLoader function to load saved atom instances and directly apply all data transformations. The \"remove\" strategy in the data cleaning parameters is deprecated in favour of \"drop\". Implemented the DFS strategy in FeatureGenerator . All training classes now inherit from BaseEstimator. Added multiple new example notebooks. Tests coverage up to 100%. Completely new documentation page. Bug fixes and performance improvements.","title":"Version 4.0.0"},{"location":"#content","text":"Getting started User guide API ATOM ATOMClassifier ATOMRegressor ATOMLoader ATOMModel Data cleaning Scaler Cleaner Imputer Encoder Pruner Balancer Feature engineering FeatureGenerator FeatureSelector Training Direct DirectClassifier DirectRegressor SuccessiveHalving SuccessiveHalvingClassifier SuccessiveHalvingClassifier TrainSizing TrainSizingClassifier TrainSizingRegressor Models Gaussian Process Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli Naive Bayes Categorical Naive Bayes Complement Naive Bayes Ordinary Least Squares Ridge Lasso Elastic Net Bayesian Ridge Automated Relevance Determination Logistic Regression Linear Discriminant Analysis Quadratic Discriminant Analysis K-Nearest Neighbors Radius Nearest Neighbors Decision Tree Bagging Extra-Trees Random Forest AdaBoost Gradient Boosting Machine XGBoost LightGBM CatBoost Linear-SVM Kernel-SVM Passive Aggressive Stochastic Gradient Descent Multi-layer Perceptron Predicting transform predict predict_proba predict_log_proba decision_function score Plots plot_correlation plot_scatter_matrix plot_distribution plot_qq plot_pipeline plot_pca plot_components plot_rfecv plot_successive_halving plot_learning_curve plot_results plot_bo plot_evals plot_roc plot_prc plot_permutation_importance plot_feature_importance plot_partial_dependence plot_errors plot_residuals plot_confusion_matrix plot_threshold plot_probabilities plot_calibration plot_gains plot_lift bar_plot beeswarm_plot decision_plot force_plot heatmap_plot scatter_plot waterfall_plot Examples AutoML Binary classification Calibration Deep learning Early stopping Ensembles Feature engineering Imbalanced datasets Multiclass classification Multi-metric runs Regression Successive halving Train sizing Utilities FAQ Dependencies License","title":"Content"},{"location":"dependencies/","text":"Python As of the moment, ATOM supports Python 3.6 , 3.7 , 3.8 and 3.9 . Packages ATOM is built on top of several existing Python libraries. The required packages are necessary for it's correct functioning. Additionally, you can install some optional packages to use machine learning estimators not provided by sklearn. Required numpy (>=1.19.5) scipy (>=1.4.1) pandas (>=1.0.3) dill (>=0.3.3) tqdm (>=4.35.0) joblib (>=0.16.0) typeguard (>=2.7.1) tabulate (>=0.8.6) scikit-learn (>=0.24) scikit-optimize (>=0.7.4) tpot (>=0.11.7) pandas-profiling (>=2.3.0) category-encoders (>=2.1.0) imbalanced-learn (>=0.5.0) featuretools (>=0.17.0) gplearn (>=0.4.1) matplotlib (>=3.3.0) seaborn (>=0.9.0) shap (>=0.38.1) Optional xgboost (>=0.90) lightgbm (>=2.3.0) catboost (>=0.19.1) Support ATOM recognizes the support from JetBrains by providing core project contributors with a set of developer tools free of charge.","title":"Dependencies"},{"location":"dependencies/#python","text":"As of the moment, ATOM supports Python 3.6 , 3.7 , 3.8 and 3.9 .","title":"Python"},{"location":"dependencies/#packages","text":"ATOM is built on top of several existing Python libraries. The required packages are necessary for it's correct functioning. Additionally, you can install some optional packages to use machine learning estimators not provided by sklearn.","title":"Packages"},{"location":"dependencies/#required","text":"numpy (>=1.19.5) scipy (>=1.4.1) pandas (>=1.0.3) dill (>=0.3.3) tqdm (>=4.35.0) joblib (>=0.16.0) typeguard (>=2.7.1) tabulate (>=0.8.6) scikit-learn (>=0.24) scikit-optimize (>=0.7.4) tpot (>=0.11.7) pandas-profiling (>=2.3.0) category-encoders (>=2.1.0) imbalanced-learn (>=0.5.0) featuretools (>=0.17.0) gplearn (>=0.4.1) matplotlib (>=3.3.0) seaborn (>=0.9.0) shap (>=0.38.1)","title":"Required"},{"location":"dependencies/#optional","text":"xgboost (>=0.90) lightgbm (>=2.3.0) catboost (>=0.19.1)","title":"Optional"},{"location":"dependencies/#support","text":"ATOM recognizes the support from JetBrains by providing core project contributors with a set of developer tools free of charge.","title":"Support"},{"location":"faq/","text":"Frequently asked questions There already is an atom text editor. Does this has anything to do with that? How does ATOM relate to AutoML? Is it possible to run deep learning models? Can I run atom's methods on just a subset of the columns? How can I compare the same model on different datasets? Can I train models through atom using a GPU? How are numerical and categorical columns differentiated? Can I run unsupervised learning pipelines? Is there a way to plot multiple models in the same shap plot? Can I merge a sklearn pipeline with atom? Is it possible to initialize atom with an existing train and test set? There already is an atom text editor. Does this has anything to do with that? There is, indeed, a text editor with the same name and a similar logo. Is this a shameless copy? No. When I started the project, I didn't know about the text editor, and it doesn't require much thinking to come up with the idea of replacing the letter O of the word atom with the image of an atom. How does ATOM relate to AutoML? ATOM is not an AutoML tool since it does not automate the search for an optimal pipeline like well known AutoML tools such as auto-sklearn or TPOT do. Instead, ATOM helps the user find the optimal pipeline himself. One of the goals of this package is to help data scientists produce explainable pipelines, and using an AutoML black box function would impede that. That said, it is possible to integrate a TPOT pipeline with atom through the automl method. Is it possible to run deep learning models? Yes. Deep learning models can be added as custom models to the pipeline as long as they follow sklearn's API . If the dataset is 2-dimensional, everything should work normally. If the dataset has more than 2 dimensions (referred in the documentation as deep learning datasets, often the case for images or text embeddings), only a subset of atom's methods will work. For more information, see the deep learning section of the user guide. Can I run atom's methods on just a subset of the columns? Yes, all data cleaning and feature engineering methods accept a columns parameter to only transform the selected features. For example, to only impute the numerical columns in the dataset we could type atom.impute(strat_num=\"mean\", columns=atom.numerical) . The parameter accepts column names, column indices or a slice object. How can I compare the same model on different datasets? In many occasions you might want to test how a model performs on datasets processed with different pipelines. For this, atom has the branch system . Create a new branch for every new pipeline you want to test and use the plot methods to compare all models, independent of the branch it was trained on. Can I train models through atom using a GPU? ATOM doesn't fit the models himself. The underlying models' package does. Since the majority of predefined models are implemented through sklearn and sklearn works on CPU only, they can not be trained on any GPU. If you are using a custom model whose package, Keras for example, allows GPU implementation and the settings or model parameters are tuned to do so, the model will train on the GPU like it would do outside atom. How are numerical and categorical columns differentiated? The columns are separated using pandas' select_dtypes method for dataframes. Numerical columns are selected using include=\"number\" whereas categorical columns are selected using exclude=\"number\" . Can I run unsupervised learning pipelines? No. As for now, ATOM only supports supervised machine learning pipelines. However, various unsupervised algorithms can be chosen as strategy in the Pruner class to detect and remove outliers from the dataset. Is there a way to plot multiple models in the same shap plot? No. Unfortunately, there is no way to plot multiple models in the same shap plot since the plots are made by the SHAP package and passed as matplotlib.axes objects to atom. This means that it's not within the reach of this package to implement such an utility. Can I merge a sklearn pipeline with atom? Yes. Like any other transformer, it is possible to add a sklearn pipeline to atom using the add method. Every transformer in the pipeline is merged independently. The pipeline is not allowed to end with a model since atom manages its own models. If that is the case, add the pipeline using atom.add(pipeline[:-1]) . Is it possible to initialize atom with an existing train and test set? Yes. If you already have a separated train and test set you can initialize atom in two ways: atom = ATOMClassifier(train, test) atom = ATOMClassifier((X_train, y_train), (X_test, y_test)) Make sure the train and test size have the same number of columns. If initialized like this, the test_size parameter is ignored.","title":"FAQ"},{"location":"faq/#frequently-asked-questions","text":"There already is an atom text editor. Does this has anything to do with that? How does ATOM relate to AutoML? Is it possible to run deep learning models? Can I run atom's methods on just a subset of the columns? How can I compare the same model on different datasets? Can I train models through atom using a GPU? How are numerical and categorical columns differentiated? Can I run unsupervised learning pipelines? Is there a way to plot multiple models in the same shap plot? Can I merge a sklearn pipeline with atom? Is it possible to initialize atom with an existing train and test set?","title":"Frequently asked questions"},{"location":"faq/#there-already-is-an-atom-text-editor-does-this-has-anything-to-do-with-that","text":"There is, indeed, a text editor with the same name and a similar logo. Is this a shameless copy? No. When I started the project, I didn't know about the text editor, and it doesn't require much thinking to come up with the idea of replacing the letter O of the word atom with the image of an atom.","title":"There already is an atom text editor. Does this has anything to do with that?"},{"location":"faq/#how-does-atom-relate-to-automl","text":"ATOM is not an AutoML tool since it does not automate the search for an optimal pipeline like well known AutoML tools such as auto-sklearn or TPOT do. Instead, ATOM helps the user find the optimal pipeline himself. One of the goals of this package is to help data scientists produce explainable pipelines, and using an AutoML black box function would impede that. That said, it is possible to integrate a TPOT pipeline with atom through the automl method.","title":"How does ATOM relate to AutoML?"},{"location":"faq/#is-it-possible-to-run-deep-learning-models","text":"Yes. Deep learning models can be added as custom models to the pipeline as long as they follow sklearn's API . If the dataset is 2-dimensional, everything should work normally. If the dataset has more than 2 dimensions (referred in the documentation as deep learning datasets, often the case for images or text embeddings), only a subset of atom's methods will work. For more information, see the deep learning section of the user guide.","title":"Is it possible to run deep learning models?"},{"location":"faq/#can-i-run-atoms-methods-on-just-a-subset-of-the-columns","text":"Yes, all data cleaning and feature engineering methods accept a columns parameter to only transform the selected features. For example, to only impute the numerical columns in the dataset we could type atom.impute(strat_num=\"mean\", columns=atom.numerical) . The parameter accepts column names, column indices or a slice object.","title":"Can I run atom's methods on just a subset of the columns?"},{"location":"faq/#how-can-i-compare-the-same-model-on-different-datasets","text":"In many occasions you might want to test how a model performs on datasets processed with different pipelines. For this, atom has the branch system . Create a new branch for every new pipeline you want to test and use the plot methods to compare all models, independent of the branch it was trained on.","title":"How can I compare the same model on different datasets?"},{"location":"faq/#can-i-train-models-through-atom-using-a-gpu","text":"ATOM doesn't fit the models himself. The underlying models' package does. Since the majority of predefined models are implemented through sklearn and sklearn works on CPU only, they can not be trained on any GPU. If you are using a custom model whose package, Keras for example, allows GPU implementation and the settings or model parameters are tuned to do so, the model will train on the GPU like it would do outside atom.","title":"Can I train models through atom using a GPU?"},{"location":"faq/#how-are-numerical-and-categorical-columns-differentiated","text":"The columns are separated using pandas' select_dtypes method for dataframes. Numerical columns are selected using include=\"number\" whereas categorical columns are selected using exclude=\"number\" .","title":"How are numerical and categorical columns differentiated?"},{"location":"faq/#can-i-run-unsupervised-learning-pipelines","text":"No. As for now, ATOM only supports supervised machine learning pipelines. However, various unsupervised algorithms can be chosen as strategy in the Pruner class to detect and remove outliers from the dataset.","title":"Can I run unsupervised learning pipelines?"},{"location":"faq/#is-there-a-way-to-plot-multiple-models-in-the-same-shap-plot","text":"No. Unfortunately, there is no way to plot multiple models in the same shap plot since the plots are made by the SHAP package and passed as matplotlib.axes objects to atom. This means that it's not within the reach of this package to implement such an utility.","title":"Is there a way to plot multiple models in the same shap plot?"},{"location":"faq/#can-i-merge-a-sklearn-pipeline-with-atom","text":"Yes. Like any other transformer, it is possible to add a sklearn pipeline to atom using the add method. Every transformer in the pipeline is merged independently. The pipeline is not allowed to end with a model since atom manages its own models. If that is the case, add the pipeline using atom.add(pipeline[:-1]) .","title":"Can I merge a sklearn pipeline with atom?"},{"location":"faq/#is-it-possible-to-initialize-atom-with-an-existing-train-and-test-set","text":"Yes. If you already have a separated train and test set you can initialize atom in two ways: atom = ATOMClassifier(train, test) atom = ATOMClassifier((X_train, y_train), (X_test, y_test)) Make sure the train and test size have the same number of columns. If initialized like this, the test_size parameter is ignored.","title":"Is it possible to initialize atom with an existing train and test set?"},{"location":"getting_started/","text":"Installation Install ATOM's newest release easily via pip : $ pip install -U atom-ml or via conda : $ conda install -c conda-forge atom-ml Note that using these commands also install/update all required dependencies . To install the optional dependencies , add [models] after the package's name. $ pip install -U atom-ml[models] Note Since atom was already taken, download the package under the name atom-ml ! Usage Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use: from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y) atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) ATOM has multiple data cleaning methods to help you prepare the data for modelling: atom.impute(strat_num=\"knn\", strat_cat=\"most_frequent\", min_frac_rows=0.1) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.05) atom.feature_selection(strategy=\"PCA\", n_features=12) Train and evaluate the models you want to compare: atom.run( models=[\"LR\", \"LDA\", \"XGB\", \"lSVM\"], metric=\"f1\", n_calls=25, n_initial_points=10, bagging=4, ) Make plots to analyze the results: atom.plot_results(figsize=(9, 6), filename=\"bagging_results.png\") atom.lda.plot_confusion_matrix(normalize=True, filename=\"cm.png\")","title":"Getting started"},{"location":"getting_started/#installation","text":"Install ATOM's newest release easily via pip : $ pip install -U atom-ml or via conda : $ conda install -c conda-forge atom-ml Note that using these commands also install/update all required dependencies . To install the optional dependencies , add [models] after the package's name. $ pip install -U atom-ml[models] Note Since atom was already taken, download the package under the name atom-ml !","title":"Installation"},{"location":"getting_started/#usage","text":"Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use: from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y) atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) ATOM has multiple data cleaning methods to help you prepare the data for modelling: atom.impute(strat_num=\"knn\", strat_cat=\"most_frequent\", min_frac_rows=0.1) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.05) atom.feature_selection(strategy=\"PCA\", n_features=12) Train and evaluate the models you want to compare: atom.run( models=[\"LR\", \"LDA\", \"XGB\", \"lSVM\"], metric=\"f1\", n_calls=25, n_initial_points=10, bagging=4, ) Make plots to analyze the results: atom.plot_results(figsize=(9, 6), filename=\"bagging_results.png\") atom.lda.plot_confusion_matrix(normalize=True, filename=\"cm.png\")","title":"Usage"},{"location":"license/","text":"MIT License Copyright (c) 2020 tvdboom Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"License"},{"location":"license/#mit-license","text":"Copyright (c) 2020 tvdboom Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"user_guide/","text":"Introduction There is no magic formula in data science that can tell us which type of machine learning estimator in combination with which pipeline will perform best for a given raw dataset. Different models are better suited for different types of data and different types of problems. At best, you can follow some rough guide on how to approach problems with regard to which model to try on your data, but these are incomplete at best. During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? Although best practices tell us to start with a simple model and build up to more complicated ones, many data scientists just use the model best known to them in order to avoid the aforementioned problems. This can result in poor performance (because the model is just not the right one for the task) or in inefficient management of time and computing resources (because a simpler/faster model could have achieved a similar performance). ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. It is important to realize that ATOM is not here to replace all the work a data scientist has to do before getting his model into production. ATOM doesn't spit out production-ready models just by tuning some parameters in its API. After helping you determine the right pipeline, you will most probably need to fine-tune it using use-case specific features and data cleaning steps in order to achieve maximum performance. Nomenclature In this documentation we will consistently use terms to refer to certain concepts related to this package. atom : Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name). ATOM : Refers to this package. branch : Collection of estimators in the pipeline fitted to a specific dataset. See the branches section. BO : Bayesian optimization algorithm used for hyperparameter optimization. categorical columns : Refers to all non-numerical columns. class : Unique value in a column, e.g. a binary classifier has 2 classes in the target column. estimator : An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. missing values : Values in the missing attribute. model : Instance of a model in the pipeline. outlier : Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy. outlier value : Value that lies further than 3 times the standard_deviation away from the mean of its column (|z-score| > 3). pipeline : All the content in atom for a specific branch. predictor : An estimator implementing a predict method. This encompasses all classifiers and regressors. scorer : A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation . sequence : A one-dimensional array of variable type list , tuple , np.ndarray or pd.Series . target : Name of the dependent variable, passed as y to an estimator's fit method. task : One of the three supervised machine learning approaches that ATOM supports: binary classification multiclass classification regression trainer : Instance of a class that train and evaluate the models (implement a run method). The following classes are considered trainers: ATOMClassifier ATOMRegressor DirectClassifier DirectRegressor SuccessiveHalvingClassifier SuccessiveHavingRegressor TrainSizingClassifier TrainSizingRegressor transformer : An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes. First steps You can quickly install atom using pip or conda , see the installation guide . ATOM contains a variety of classes to perform data cleaning, feature engineering, model training, plotting and much more. The easiest way to use everything ATOM has to offer is through one of the main classes: ATOMClassifier for binary or multiclass classification tasks. ATOMRegressor for regression tasks. These two classes are convenient wrappers for the whole machine learning pipeline. Like a sklearn Pipeline , they assemble several steps that can be cross-validated together while setting different parameters. There are some important differences with sklearn's API: atom is initialized with the data you want to manipulate. This data can be accessed at any moment through atom's data attributes . The classes in ATOM's API are reached through atom's methods. For example, calling the encode method will initialize an Encoder instance, fit it on the training set and transform the whole dataset. The transformations are applied immediately after calling the method (there is no fit method). This approach gives the user a clearer overview and more control over every step in the pipeline. Let's get started with an example! First, initialize atom and provide it the data you want to use. You can either input a dataset and let ATOM split the train and test set or provide a train and test set already split. Note that if a dataframe is provided, the indices are reset by atom. atom = ATOMClassifier(X, y, test_size=0.25) Apply data cleaning methods through the class. For example, calling the impute method will handle all missing values in the dataset. atom.impute(strat_num=\"median\", strat_cat=\"most_frequent\", min_frac_rows=0.1) Select the best hyperparameters and fit a Random Forest and AdaBoost model. atom.run([\"RF\", \"AdaB\"], metric=\"accuracy\", n_calls=25, n_initial_points=10) Analyze the results: atom.feature_importances(show=10, filename=\"feature_importance_plot\") atom.plot_prc(title=\"Precision-recall curve comparison plot\") Data pipelines It may happen that you want to compare how a model performs on different datasets. For example, on one dataset balanced with an undersampling strategy and the other with an oversampling strategy. For this, atom has data pipelines. Branches Data pipelines manage separate paths atom's dataset can take. The paths are called branches and can be accessed through the branch attribute. Calling it will show the branches in the pipeline. The current branch is indicated with ! . A branch contains a specific dataset, and the transformers it took to arrive to that dataset from the one atom initialized with. Accessing data attributes such as atom.dataset will return the data in the current branch. Use the pipeline attribute to see the estimators in the branch. All data cleaning, feature engineering and trainers called will use the dataset in the current branch. This means that models are trained and validated on the data in that branch. Don't change the data in a branch after fitting a model, this can cause unexpected model behaviour. Instead, create a new branch for every unique model pipeline. By default, atom starts with one branch called \"master\". To start a new branch, set a new name to the property, e.g. atom.branch = \"new_branch\" . This will start a new branch from the current one. To create a branch from any other branch type \"_from_\" between the new name and the branch from which to split, e.g. atom.branch = \"branch2_from_branch1\" will create branch \"branch2\" from branch \"branch1\". To switch between existing branches, just type the name of the desired branch, e.g. atom.branch = \"master\" to go back to the main branch. Note that every branch contains a unique copy of the whole dataset! Creating many branches can cause memory issues for large datasets. You can delete a branch either deleting the attribute, e.g. del atom.branch , or using the delete method, e.g. atom.branch.delete() . A branch can only be deleted if no models were trained on its dataset. Use atom.branch.status() to print a list of the transformers and models in the branch. See the Imbalanced datasets or Feature engineering examples for branching use cases. Warning Always create a new branch if you want to change the dataset after fitting a model! Not doing so can cause unexpected model behaviour. Data transformations Performing data transformations is a common requirement of many datasets before they are ready to be ingested by a model. ATOM provides various classes to apply data cleaning and feature engineering transformations to the data. This tooling should be able to help you apply most of the typically needed transformations to get the data ready for modelling. For further fine-tuning, it is also possible to pre-process the data using custom transformers. They can be added to the pipeline using atom's add method. Remember that all transformations are only applied to the dataset in the current branch. AutoML Automated machine learning (AutoML) automates the selection, composition and parameterization of machine learning pipelines. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms. ATOM uses the TPOT package for AutoML optimization. TPOT uses a genetic algorithm to intelligently explore thousands of possible pipelines in order to find the best one for your data. Such an algorithm can be started through the automl method. The resulting data transformers and final estimator are merged with atom's pipeline (check the pipeline and models attributes after the method finishes running). Warning AutoML algorithms aren't intended to run for only a few minutes. If left to its default parameters, the method can take a very long time to finish! Data cleaning More often than not, you need to do some data cleaning before fitting your dataset to a model. Usually, this involves importing different libraries and writing many lines of code. Since ATOM is all about fast exploration and experimentation, it provides various data cleaning classes to apply the most common transformations fast and easy. Note All of atom's data cleaning methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.scale(verbose=2) . Note Like the add method, the data cleaning methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.scale(columns=[0, 1]) . Scaling the feature set Standardization of a dataset is a common requirement for many machine learning estimators; they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with zero mean and unit variance). The Scaler class let you quickly scale atom's dataset using one of sklearn's scalers. It can be accessed from atom through the scale method. Standard data cleaning There are many data cleaning steps that are useful to perform on any dataset before modelling. These are general rules that apply almost on every use-case and every task. The Cleaner class is a convenient tool to apply such steps. It can be accessed from atom through the clean method. Use the class' parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. Imputing missing values For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. Such datasets however are incompatible with ATOM's models which assume that all values in an array are numerical, and that all have and hold meaning. The Imputer class handles missing values in the dataset by either dropping or imputing the value. It can be accessed from atom through the impute method. Tip Use atom's missing attribute to check the amount of missing values per feature. Encoding categorical features Many datasets will contain categorical features. Their variables are typically stored as text values which represent various traits. Some examples include color (\u201cRed\u201d, \u201cYellow\u201d, \u201cBlue\u201d), size (\u201cSmall\u201d, \u201cMedium\u201d, \u201cLarge\u201d) or geographic designations (city or country). Regardless of what the value is used for, the challenge is determining how to use this data in the analysis. ATOM's models don't support direct manipulation of this kind of data. Use the Encoder class to encode categorical features to numerical values. It can be accessed from atom through the encode method. Tip Use atom's categorical attribute for a list of the categorical features in the dataset. Handling outliers When modeling, it is important to clean the data sample to ensure that the observations best represent the problem. Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers. Often, machine learning modeling and model skill in general can be improved by understanding and even removing these outlier samples. The Pruner class offers 5 different strategies to detect outliers (described hereunder). It can be accessed from atom through the prune method. Tip Use atom's outliers attribute to check the number of outliers per column. z-score The z-score of a value in the dataset is defined as the number of standard deviations by which the value is above or below the mean of the column. Values above or below a certain threshold (specified with the parameter max_sigma ) are considered outliers. Note that, contrary to the rest of the strategies, this approach selects outlier values, not outlier samples! Because of this, it is possible to replace the outlier value instead of simply dropping the sample. Isolation Forest Uses a tree-based anomaly detection algorithm. It is based on modeling the normal data in such a way as to isolate anomalies that are both few and different in the feature space. Read more in sklearn's documentation . Elliptic Envelope If the input variables have a Gaussian distribution, then simple statistical methods can be used to detect outliers. For example, if the dataset has two input variables and both are Gaussian, then the feature space forms a multi-dimensional Gaussian and knowledge of this distribution can be used to identify values far from the distribution. This approach can be generalized by defining a hypersphere (ellipsoid) that covers the normal data, and data that falls outside this shape is considered an outlier. Read more in sklearn's documentation . Local Outlier Factor A simple approach to identifying outliers is to locate those examples that are far from the other examples in the feature space. This can work well for feature spaces with low dimensionality (few features) but becomes less reliable as the number of features is increased. This is referred to as the curse of dimensionality. The local outlier factor is a technique that attempts to harness the idea of nearest neighbors for outlier detection. Each example is assigned a scoring of how isolated or how likely it is to be outliers based on the size of its local neighborhood. Those examples with the largest score are more likely to be outliers. Read more in sklearn's documentation . One-class SVM The support vector machine algorithm developed initially for binary classification can be used for one-class classification. When modeling one class, the algorithm captures the density of the majority class and classifies examples on the extremes of the density function as outliers. This modification of SVM is referred to as One-Class SVM. Read more in sklearn's documentation . DBSCAN The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. Samples that lie outside any cluster are considered outliers. Read more in sklearn's documentation . OPTICS The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, and a spot within the cluster ordering. These two attributes are assigned when the model is fitted, and are used to determine cluster membership. Read more in sklearn's documentation . Balancing the data One of the common issues found in datasets that are used for classification is imbalanced classes. Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of the transactions are non-fraud, and a very few cases are fraud. This leaves us with a very unbalanced ratio of fraud vs non-fraud cases. The Balancer class can oversample the minority class or undersample the majority class using any of the transformers implemented in imblearn . It can be accessed from atom through the balance method. Feature engineering \"Applied machine learning\" is basically feature engineering. ~ Andrew Ng. Feature engineering is the process of creating new features from the existing ones, in order to capture relationships with the target column that the first set of features didn't had on their own. This process is very important to improve the performance of machine learning algorithms. Although feature engineering works best when the data scientist applies use-case specific transformations, there are ways to do this in an automated manner, without prior domain knowledge. One of the problems of creating new features without human expert intervention, is that many of the newly created features can be useless, i.e. they do not help the algorithm to make better predictions. Even worse, having useless features can drop your performance. To avoid this, we perform feature selection, a process in which we select the relevant features in the dataset. See the Feature engineering example. Note All of atom's feature engineering methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, n_jobs=4) . Note Like the add method, the feature engineering methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, columns=slice(5, 15)) . Generating new features The FeatureGenerator class creates new non-linear features based on the original feature set. It can be accessed from atom through the feature_generation method. You can choose between two strategies: Deep Feature Synthesis and Genetic Feature Generation. Deep Feature Synthesis Deep feature synthesis (DFS) applies the selected operators on the features in the dataset. For example, if the operator is \"log\", it will create the new feature LOG(old_feature) and if the operator is \"mul\", it will create the new feature old_feature_1 x old_feature_2 . The operators can be chosen through the operators parameter. Available options are: add: Sum two features together. sub: Subtract two features from each other. mul: Multiply two features with each other. div: Divide two features with each other. srqt: Take the square root of a feature. log: Take the logarithm of a feature. sin: Calculate the sine of a feature. cos: Calculate the cosine of a feature. tan: Calculate the tangent of a feature. ATOM's implementation of DFS uses the featuretools package. Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Genetic Feature Generation Genetic feature generation (GFG) uses genetic programming , a branch of evolutionary programming, to determine which features are successful and create new ones based on those. Where DFS can be seen as some kind of \"brute force\" for feature engineering, GFG tries to improve its features with every generation of the algorithm. GFG uses the same operators as DFS, but instead of only applying the transformations once, it evolves them further, creating complicated non-linear combinations of features with many transformations. The new features are given the name Feature N for the N-th feature. You can access the genetic feature's fitness and description (how they are calculated) through the genetic_features attribute. ATOM uses the SymbolicTransformer class from the gplearn package for the genetic algorithm. Read more about this implementation here . Warning GFG can be slow for very large populations! Selecting useful features The FeatureSelector class provides tooling to select the relevant features from a dataset. It can be accessed from atom through the feature_selection method. The following strategies are implemented: univariate, PCA, SFM, RFE and RFECV. Univariate Univariate feature selection works by selecting the best features based on univariate statistical F-test. The test is provided via the solver parameter. It takes any function taking two arrays (X, y), and returning arrays (scores, p-values). Read more in sklearn's documentation . Principal Components Analysis Applying PCA will reduce the dimensionality of the dataset by maximizing the variance of each dimension. The new features are called Component 1, Component 2, etc... The data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Read more in sklearn's documentation . Selection from model SFM uses an estimator with feature_importances_ or coef_ attributes to select the best features in a dataset based on importance weights. The estimator is provided through the solver parameter and can be already fitted. ATOM allows you to use one its predefined models , e.g. solver=\"RF\" . If you didn't call the FeatureSelector through atom, don't forget to indicate the estimator's task adding _class or _reg after the name, e.g. RF_class to use a random forest classifier. Read more in sklearn's documentation . Recursive feature elimination Select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached. Note that, since RFE needs to fit the model again every iteration, this method can be fairly slow. RFECV applies the same algorithm as RFE but uses a cross-validated metric (under the scoring parameter, see RFECV ) to assess every step's performance. Also, where RFE returns the number of features selected by n_features , RFECV returns the number of features that achieved the optimal score on the specified metric. Note that this is not always equal to the amount specified by n_features . Read more in sklearn's documentation . Removing features with low variance Variance is the expectation of the squared deviation of a random variable from its mean. Features with low variance have many values repeated, which means the model will not learn much from them. FeatureSelector removes all features where the same value is repeated in at least max_frac_repeated fraction of the rows. The default option is to remove a feature if all values in it are the same. Read more in sklearn's documentation . Removing features with multi-collinearity Two features that are highly correlated are redundant, i.e. two will not contribute more to the model than only one of them. FeatureSelector will drop a feature that has a Pearson correlation coefficient larger than max_correlation with another feature. A correlation of 1 means the two columns are equal. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE and RFECV strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs. Models Predefined models ATOM provides 31 models for classification and regression tasks that can be used to fit the data in the pipeline. After fitting, every model class is attached to the trainer as an attribute. We refer to these \"subclasses\" as models (see the nomenclature ). The classes contain a variety of attributes and methods to help you understand how the underlying estimator performed. They can be accessed using their acronyms, e.g. atom.LGB to access the LightGBM's model. The available models and their corresponding acronyms are: \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Classification/Regression \"Lasso\" for Lasso Regression \"EN\" for Elastic Net \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost \"LGB\" for LightGBM \"CatB\" for CatBoost \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron Tip The acronyms are case insensitive. You can also use lowercase to call the models, e.g. atom.lgb . Warning The models should not be initialized by the user! Only use them through the trainers. Custom models It is also possible to use your own models in ATOM's pipeline. For example, imagine we want to use sklearn's Lars estimator (note that is not included in ATOM's predefined models ). There are two ways to achieve this: Using ATOMModel (recommended). With this approach you can pass the required model characteristics to the pipeline. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel model = ATOMModel(models=Lars, fullname=\"Lars Regression\", needs_scaling=True, type=\"linear\") atom = ATOMRegressor(X, y) atom.run(model) Using the estimator's class or an instance of the class. This approach will also call ATOMModel under the hood, but it will leave its parameters to their default values. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel atom = ATOMRegressor(X, y) atom.run(models=Lars) Additional things to take into account: Custom models are not restricted to sklearn estimators, but they should follow sklearn's API , i.e. have a fit and predict method. Parameter customization (for the initializer) is only possible for custom models which provide an estimator's class or an instance that has a set_params() method, i.e. it's a child class of BaseEstimator . Hyperparameter optimization for custom models is ignored unless appropriate dimensions are provided through bo_params . If the estimator has a n_jobs and/or random_state parameter that is left to its default value, it will automatically adopt the values from the trainer it's called from. Deep learning Deep learning models can be used through ATOM's custom models as long as they follow sklearn's API . For example, models implemented with the Keras package should use the sklearn wrappers KerasClassifier or kerasRegressor . Many deep learning models, for example in computer vision and natural language processing, use datasets with more than 2 dimensions, e.g. image data can have shape (n_samples, length, width, rgb). These data structures are not intended to store in a two-dimensional pandas dataframe. Since ATOM requires a dataframe as instance for the dataset, multidimensional data sets are stored in a single column called \"Features\" where every row contains one (multidimensional) sample. example. Note that, because of this, the data cleaning , feature engineering and some of the plotting methods are unavailable for deep learning datasets. See in this example how to use ATOM to train and validate a Convolutional Neural Network implemented with Keras. Training The training phase is where the models are fitted and evaluated. After this, the models are attached to the trainer and you can use the plotting and predicting methods. The pipeline applies the following steps iteratively for all models: The optimal hyperparameters are selected. The model is trained on the training set and evaluated on the test set. The bagging algorithm is applied. There are three approaches to run the training. Direct training: DirectClassifier DirectRegressor Training via successive halving : SuccessiveHalvingClassifier SuccessiveHavingRegressor Training via train sizing : TrainSizingClassifier TrainSizingRegressor The direct fashion repeats the aforementioned steps only once, while the other two approaches repeats them more than once. Every approach can be directly called from atom through the run , successive_halving and train_sizing methods respectively. Models are called through their acronyms , e.g. atom.run(models=\"RF\") will train a Random Forest . If you want to run the same model multiple times, add a tag after the acronym to differentiate them. atom.run(models=[\"RF1\", \"RF2\"], est_params={\"RF1\": {\"n_estimators\": 100}, \"RF2\": {\"n_estimators\": 200}}) For example, this pipeline will fit two Random Forest models, one with 100 and the other with 200 decision trees. The models can be accessed through atom.rf1 and atom.rf2 . Use tagged models to test how the same model performs when fitted with different parameters or on different data sets. See the Imbalanced datasets example. Additional things to take into account: Models that need feature scaling will do so automatically before training if they are not already scaled. If an exception is encountered while fitting an estimator, the pipeline will automatically jump to the next model. The errors are stored in the errors attribute. Note that in case a model is skipped, there will be no model subclass for that estimator. When showing the final results, a ! indicates the highest score and a ~ indicates that the model is possibly overfitting (training set has a score at least 20% higher than the test set). The winning model (the one with the highest mean_bagging or metric_test ) can be accessed through the winner attribute. Metric ATOM uses sklearn's SCORERS for model selection and evaluation. A scorer consists of a metric function and some parameters that define the scorer's properties such as it's a score or loss function or if the function needs probability estimates or rounded predictions (see make_scorer ). ATOM lets you define the scorer for the pipeline in three ways: The metric parameter is one of sklearn's predefined scorers (as string). The metric parameter is a score (or loss) function with signature metric(y, y_pred, **kwargs). In this case, use the greater_is_better , needs_proba and needs_threshold parameters to specify the scorer's properties. The metric parameter is a scorer object. Note that all scorers follow the convention that higher return values are better than lower return values. Thus, metrics which measure the distance between the model and the data (i.e. loss functions), like max_error or mean_squared_error , will return the negated value of the metric. Custom scorer acronyms Since some of sklearn's scorers have quite long names and ATOM is all about lazy fast experimentation, the package provides acronyms for some of the most commonly used ones. These acronyms are case-insensitive and can be used in the metric parameter instead of the scorer's full name, e.g. atom.run(\"LR\", metric=\"BA\") will use balanced_accuracy . The available acronyms are: \"AP\" for \"average_precision\" \"BA\" for \"balanced_accuracy\" \"AUC\" for \"roc_auc\" \"LogLoss\" for \"neg_log_loss\" \"EV\" for \"explained_variance\" \"ME\" for \"max_error\" \"MAE\" for \"neg_mean_absolute_error\" \"MSE\" for \"neg_mean_squared_error\" \"RMSE\" for \"neg_root_mean_squared_error\" \"MSLE\" for \"neg_mean_squared_log_error\" \"MEDAE\" for \"neg_median_absolute_error\" \"POISSON\" for \"neg_mean_poisson_deviance\" \"GAMMA\" for \"neg_mean_gamma_deviance\" Multi-metric runs Sometimes it is useful to measure the performance of the models in more than one way. ATOM lets you run the pipeline with multiple metrics at the same time. To do so, provide the metric parameter with a list of desired metrics, e.g. atom.run(\"LDA\", metric=[\"r2\", \"mse\"]) . If you provide metric functions, don't forget to also provide lists to the greater_is_better , needs_proba and needs_threshold parameters, where the n-th value in the list corresponds to the n-th function. If you leave them as a single value, that value will apply to every provided metric. When fitting multi-metric runs, the resulting scores will return a list of metrics. For example, if you provided three metrics to the pipeline, atom.knn.metric_bo could return [0.8734, 0.6672, 0.9001]. It is also important to note that only the first metric of a multi-metric run is used to evaluate every step of the bayesian optimization and to select the winning model. Tip Some plots let you choose which of the metrics to show using the metric parameter. Parameter customization By default, the parameters every estimator uses are the same default parameters they get from their respective packages. To select different ones, use est_params . There are two ways to add custom parameters to the models: adding them directly to the dictionary as key-value pairs or through multiple dicts with the model names as keys. Adding the parameters directly to est_params will share them across all models in the pipeline. In this example, both the XGBoost and the LightGBM model will use n_estimators=200. Make sure all the models do have the specified parameters or an exception will be raised! atom.run([\"XGB\", \"LGB\"], est_params={\"n_estimators\": 200}) To specify parameters per model, use the model name as key and a dict of the parameters as value. In this example, the XGBoost model will use n_estimators=200 and the Multi-layer Perceptron will use one hidden layer with 75 neurons. atom.run([\"XGB\", \"MLP\"], est_params={\"XGB\": {\"n_estimators\": 200}, \"MLP\": {\"hidden_layer_sizes\": (75,)}}) Some estimators allow you to pass extra parameters to the fit method (besides X and y). This can be done adding _fit at the end of the parameter. For example, to change XGBoost's verbosity, we can run: atom.run(\"XGB\", est_params={\"verbose_fit\": True}) Note If a parameter is specified through est_params , it is ignored by the bayesian optimization! Hyperparameter optimization In order to achieve maximum performance, we need to tune an estimator's hyperparameters before training it. ATOM provides hyperparameter tuning using a bayesian optimization (BO) approach implemented by skopt . The BO is optimized on the first metric provided with the metric parameter. Each step is either computed by cross-validation on the complete training set or by randomly splitting the training set every iteration into a (sub) training set and a validation set. This process can create some data leakage but ensures maximal use of the provided data. The test set, however, does not contain any leakage and is used to determine the final score of every model. Note that, if the dataset is relatively small, the BO's best score can consistently be lower than the final score on the test set (despite the leakage) due to the considerable fewer instances on which it is trained. There are many possibilities to tune the BO to your liking. Use n_calls and n_initial_points to determine the number of iterations that are performed randomly at the start (exploration) and the number of iterations spent optimizing (exploitation). If n_calls is equal to n_initial_points , every iteration of the BO will select its hyperparameters randomly. This means the algorithm is technically performing a random search . Note The n_calls parameter includes the iterations in n_initial_points , i.e. calling atom.run(\"LR\", n_calls=20, n_intial_points=10) will run 20 iterations of which the first 10 are random. Note If n_initial_points=1 , the first trial is equal to the estimator's default parameters. Other settings can be changed through the bo_params parameter, a dictionary where every key-value combination can be used to further customize the BO. By default, the hyperparameters and corresponding dimensions per model are predefined by ATOM. Use the dimensions key to use custom ones. Just like with est_params , you can share the same dimensions across models or use a dictionary with the model names as keys to specify the dimensions for every individual model. Note that the provided search space dimensions must be compliant with skopt's API. atom.run(\"LR\", n_calls=10, bo_params={\"dimensions\": [Integer(100, 1000, name=\"max_iter\")]}) The majority of skopt's callbacks to stop the optimizer early can be accessed through bo_params . You can include other callbacks using the callbacks key. atom.run(\"LR\", n_calls=10, bo_params={\"max_time\": 1000, \"callbacks\": custom_callback()}) You can also include other parameters for the optimizer as key-value pairs. atom.run(\"LR\", n_calls=10, bo_params={\"acq_func\": \"EI\"}) Bagging After fitting the estimator, you can asses the robustness of the model using bootstrap aggregating (bagging). This technique creates several new data sets selecting random samples from the training set (with replacement) and evaluates them on the test set. This way we get a distribution of the performance of the model. The number of sets can be chosen through the bagging parameter. Tip Use the plot_results method to plot the bagging scores in a boxplot. Early stopping XGBoost , LightGBM and CatBoost allow in-training evaluation. This means that the estimator is evaluated after every round of the training, and that the training is stopped early if it didn't improve in the last early_stopping rounds. This can save the pipeline much time that would otherwise be wasted on an estimator that is unlikely to improve further. Note that this technique is applied both during the BO and at the final fit on the complete training set. There are two ways to apply early stopping on these models: Through the early_stopping key in bo_params . This approach applies early stopping to all models in the pipeline and allows the input of a fraction of the total number of rounds. Filling the early_stopping_rounds parameter directly in est_params . Don't forget to add _fit to the parameter to call it from the fit method. After fitting, the model will get the evals attribute, a dictionary of the train and test performances per round (also if early stopping wasn't applied). Click here for an example using early stopping. Tip Use the plot_evals method to plot the in-training evaluation on the train and test set. Successive halving Successive halving is a bandit-based algorithm that fits N models to 1/N of the data. The best half are selected to go to the next iteration where the process is repeated. This continues until only one model remains, which is fitted on the complete dataset. Beware that a model's performance can depend greatly on the amount of data on which it is trained. For this reason, we recommend only to use this technique with similar models, e.g. only using tree-based models. Use successive halving through the SuccessiveHalvingClassifier / SuccessiveHalvingRegressor classes or from atom via the successive_halving method. Consecutive runs of the same model are saved with the model's acronym followed by the number of models in the run. For example, a Random Forest in a run with 4 models would become model RF4 . Click here for a successive halving example. Tip Use the plot_successive_halving method to see every model's performance per iteration of the successive halving. Train sizing When training models, there is usually a trade-off between model performance and computation time that is regulated by the number of samples in the training set. Train sizing can be used to create insights in this trade-off and help determine the optimal size of the training set, fitting the models multiple times, ever increasing the number of samples in the training set. Use train sizing through the TrainSizingClassifier / TrainSizingRegressor classes or from atom via the train_sizing method. The number of iterations and the number of samples per training can be specified with the train_sizes parameter. Consecutive runs of the same model are saved with the model's acronym followed by the fraction of rows in the training set (the . is removed from the fraction!). For example, a Random Forest in a run with 80% of the training samples would become model RF08 . Click here for a train sizing example. Tip Use the plot_learning_curve method to see the model's performance per size of the training set. Voting The idea behind Voting is to combine the predictions of conceptually different models to make new predictions. Such a technique can be useful for a set of equally well performing models in order to balance out their individual weaknesses. Read more in sklearn's documentation . A Voting model is created from a trainer through the voting method. The Voting model is added automatically to the list of models in the pipeline, under the Vote acronym. Although similar, this model is different from the VotingClassifier and VotingRegressor estimators from sklearn. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. Plots that require an estimator object will raise an exception. The Voting class has the same prediction attributes and prediction methods as other models. The predict_proba , predict_log_proba , decision_function and score methods return the average predictions (soft voting) over the models in the instance. Note that these methods will raise an exception if not all estimators in the Voting instance have the specified method. The predict method returns the majority vote (hard voting). The scoring method also returns the average scoring for the selected metric over the models. Click here for a voting example. Warning Although it is possible to include models from different branches in the same Voting instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods. Stacking Stacking is a method for combining estimators to reduce their biases. More precisely, the predictions of each individual estimator are stacked together and used as input to a final estimator to compute the prediction. Read more in sklearn's documentation . A Stacking model is created from a trainer through the stacking method. The Stacking model is added automatically to the list of models in the pipeline, under the Stack acronym. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. The prediction methods, the scoring method and the plot methods that require an estimator object will use the Voting's final estimator, under the estimator attribute. Click here for a stacking example. Warning Although it is possible to include models from different branches in the same Stacking instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods. Predicting After running a successful pipeline, it is possible you would like to apply all used transformations onto new data, or make predictions using one of the trained models. Just like a sklearn estimator, you can call the prediction methods from a fitted trainer, e.g. atom.predict(X) . Calling the method without specifying a model will use the winning model in the pipeline (under attribute winner ). To use a different model, simply call the method from a model, e.g. atom.KNN.predict(X) . All prediction methods transform the provided data through the data cleaning and feature engineering transformers before making the predictions. By default, this excludes outlier handling and balancing the dataset since these steps should only be applied on the training set. Use the method's kwargs to select which transformations to use in every call. The available prediction methods are a selection of the most common methods for estimators in sklearn's API: transform Transform new data through all transformers in a branch. predict Transform new data through all transformers in a branch and return class predictions. predict_proba Transform new data through all transformers in a branch and return class predictions. predict_log_proba Transform new data through all transformers in a branch and return class log-probabilities. decision_function Transform new data through all transformers in a branch and return predicted confidence scores. score Transform new data through all transformers in a branch and return the model's score. Except for transform, the prediction methods can be calculated on the train and test set. You can access them through the model's prediction attributes, e.g. atom.mnb.predict_train or atom.mnb.predict_test . Keep in mind that the results are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Note Many of the plots use the prediction attributes. This can considerably increase the size of the class for large datasets. Use the reset_predictions method if you need to free some memory! Plots After fitting the models to the data, it's time to analyze the results. ATOM provides many plotting methods to compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib , seaborn and shap for plotting. The plot methods can be called from a training directly, e.g. atom.plot_roc() , or from one of the models, e.g. atom.LGB.plot_roc() . If called from training , it will make the plot for all models in the pipeline. This can be useful to compare the results of multiple models. If called from a model, it will make the plot for only that model. Use this option if you want information just for that specific model or to make a plot less crowded. Parameters Apart from the plot-specific parameters they may have, all plots have four parameters in common: The title parameter allows you to add a custom title to the plot. The figsize parameter adjust the plot's size. The filename parameter is used to save the plot. The display parameter determines whether the plot is rendered. Aesthetics The plot aesthetics can be customized using the plot attributes, e.g. atom.style = \"white\" . These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. ATOM's default values are: style: \"darkgrid\" palette: \"GnBu_r_d\" title_fontsize: 20 label_fontsize: 16 tick_fontsize: 12 Use the reset_aesthetics method to reset all the aesthetics to their default value. Canvas Sometimes it might be desirable to draw multiple plots side by side in order to be able to compare them easier. Use the atom's canvas method for this. The canvas method is a @contextmanager , i.e. it is used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot could become too messy. atom = ATOMClassifier(X, y) atom.run([\"xgb\", \"lgb\"], n_calls=0) with atom.canvas(2, 2, title=\"XGBoost vs LightGBM\", filename=\"canvas\"): atom.xgb.plot_roc(dataset=\"both\", title=\"ROC - XGBoost\") atom.lgb.plot_roc(dataset=\"both\", title=\"ROC - LightGBM\") atom.xgb.plot_prc(dataset=\"both\", title=\"PRC - XGBoost\") atom.lgb.plot_prc(dataset=\"both\", title=\"PRC - LightGBM\") SHAP The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot , beeswarm_plot , decision_plot , force_plot , heatmap_plot , scatter_plot and waterfall_plot . Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot() . Info You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot . Available plots A list of available plots can be find hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand. plot_correlation Plot the data's correlation matrix. plot_scatter_matrix Plot the data's scatter matrix. plot_qq Plot a quantile-quantile plot. plot_distribution Plot column distributions. plot_pipeline Plot a diagram of every estimator in atom's pipeline. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per components. plot_rfecv Plot the RFECV results. plot_successive_halving Plot of the models\" scores per iteration of the successive halving. plot_learning_curve Plot the model's learning curve. plot_results Plot a boxplot of the bagging's results. plot_bo Plot the bayesian optimization scoring. plot_evals Plot evaluation curves for the train and test set. plot_roc Plot the Receiver Operating Characteristics curve. plot_prc Plot the precision-recall curve. plot_permutation_importance Plot the feature permutation importance of models. plot_feature_importance Plot a tree-based model's feature importance. plot_partial_dependence Plot the partial dependence of features. plot_errors Plot a model's prediction errors. plot_residuals Plot a model's residuals. plot_confusion_matrix Plot a model's confusion matrix. plot_threshold Plot metric performances against threshold values. plot_probabilities Plot the probability distribution of the classes in the target column. plot_calibration Plot the calibration curve for a binary classifier. plot_gains Plot the cumulative gains curve. plot_lift Plot the lift curve. bar_plot Plot SHAP's bar plot. beeswarm_plot Plot SHAP's beeswarm plot. decision_plot Plot SHAP's decision plot. force_plot Plot SHAP's force plot. heatmap_plot Plot SHAP's heatmap plot. scatter_plot Plot SHAP's scatter plot. waterfall_plot Plot SHAP's waterfall plot.","title":"User guide"},{"location":"user_guide/#introduction","text":"There is no magic formula in data science that can tell us which type of machine learning estimator in combination with which pipeline will perform best for a given raw dataset. Different models are better suited for different types of data and different types of problems. At best, you can follow some rough guide on how to approach problems with regard to which model to try on your data, but these are incomplete at best. During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be time-consuming. How many times have you conducted the same action to pre-process a raw dataset? How many times have you copy-and-pasted code from an old repository to re-use it in a new use case? Although best practices tell us to start with a simple model and build up to more complicated ones, many data scientists just use the model best known to them in order to avoid the aforementioned problems. This can result in poor performance (because the model is just not the right one for the task) or in inefficient management of time and computing resources (because a simpler/faster model could have achieved a similar performance). ATOM is here to help solve these common issues. The package acts as a wrapper of the whole machine learning pipeline, helping the data scientist to rapidly find a good model for his problem. Avoid endless imports and documentation lookups. Avoid rewriting the same code over and over again. With just a few lines of code, it's now possible to perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset, providing quick insights on which pipeline performs best for the task at hand. It is important to realize that ATOM is not here to replace all the work a data scientist has to do before getting his model into production. ATOM doesn't spit out production-ready models just by tuning some parameters in its API. After helping you determine the right pipeline, you will most probably need to fine-tune it using use-case specific features and data cleaning steps in order to achieve maximum performance.","title":"Introduction"},{"location":"user_guide/#nomenclature","text":"In this documentation we will consistently use terms to refer to certain concepts related to this package. atom : Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name). ATOM : Refers to this package. branch : Collection of estimators in the pipeline fitted to a specific dataset. See the branches section. BO : Bayesian optimization algorithm used for hyperparameter optimization. categorical columns : Refers to all non-numerical columns. class : Unique value in a column, e.g. a binary classifier has 2 classes in the target column. estimator : An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. missing values : Values in the missing attribute. model : Instance of a model in the pipeline. outlier : Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy. outlier value : Value that lies further than 3 times the standard_deviation away from the mean of its column (|z-score| > 3). pipeline : All the content in atom for a specific branch. predictor : An estimator implementing a predict method. This encompasses all classifiers and regressors. scorer : A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation . sequence : A one-dimensional array of variable type list , tuple , np.ndarray or pd.Series . target : Name of the dependent variable, passed as y to an estimator's fit method. task : One of the three supervised machine learning approaches that ATOM supports: binary classification multiclass classification regression trainer : Instance of a class that train and evaluate the models (implement a run method). The following classes are considered trainers: ATOMClassifier ATOMRegressor DirectClassifier DirectRegressor SuccessiveHalvingClassifier SuccessiveHavingRegressor TrainSizingClassifier TrainSizingRegressor transformer : An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes.","title":"Nomenclature"},{"location":"user_guide/#first-steps","text":"You can quickly install atom using pip or conda , see the installation guide . ATOM contains a variety of classes to perform data cleaning, feature engineering, model training, plotting and much more. The easiest way to use everything ATOM has to offer is through one of the main classes: ATOMClassifier for binary or multiclass classification tasks. ATOMRegressor for regression tasks. These two classes are convenient wrappers for the whole machine learning pipeline. Like a sklearn Pipeline , they assemble several steps that can be cross-validated together while setting different parameters. There are some important differences with sklearn's API: atom is initialized with the data you want to manipulate. This data can be accessed at any moment through atom's data attributes . The classes in ATOM's API are reached through atom's methods. For example, calling the encode method will initialize an Encoder instance, fit it on the training set and transform the whole dataset. The transformations are applied immediately after calling the method (there is no fit method). This approach gives the user a clearer overview and more control over every step in the pipeline. Let's get started with an example! First, initialize atom and provide it the data you want to use. You can either input a dataset and let ATOM split the train and test set or provide a train and test set already split. Note that if a dataframe is provided, the indices are reset by atom. atom = ATOMClassifier(X, y, test_size=0.25) Apply data cleaning methods through the class. For example, calling the impute method will handle all missing values in the dataset. atom.impute(strat_num=\"median\", strat_cat=\"most_frequent\", min_frac_rows=0.1) Select the best hyperparameters and fit a Random Forest and AdaBoost model. atom.run([\"RF\", \"AdaB\"], metric=\"accuracy\", n_calls=25, n_initial_points=10) Analyze the results: atom.feature_importances(show=10, filename=\"feature_importance_plot\") atom.plot_prc(title=\"Precision-recall curve comparison plot\")","title":"First steps"},{"location":"user_guide/#data-pipelines","text":"It may happen that you want to compare how a model performs on different datasets. For example, on one dataset balanced with an undersampling strategy and the other with an oversampling strategy. For this, atom has data pipelines.","title":"Data pipelines"},{"location":"user_guide/#branches","text":"Data pipelines manage separate paths atom's dataset can take. The paths are called branches and can be accessed through the branch attribute. Calling it will show the branches in the pipeline. The current branch is indicated with ! . A branch contains a specific dataset, and the transformers it took to arrive to that dataset from the one atom initialized with. Accessing data attributes such as atom.dataset will return the data in the current branch. Use the pipeline attribute to see the estimators in the branch. All data cleaning, feature engineering and trainers called will use the dataset in the current branch. This means that models are trained and validated on the data in that branch. Don't change the data in a branch after fitting a model, this can cause unexpected model behaviour. Instead, create a new branch for every unique model pipeline. By default, atom starts with one branch called \"master\". To start a new branch, set a new name to the property, e.g. atom.branch = \"new_branch\" . This will start a new branch from the current one. To create a branch from any other branch type \"_from_\" between the new name and the branch from which to split, e.g. atom.branch = \"branch2_from_branch1\" will create branch \"branch2\" from branch \"branch1\". To switch between existing branches, just type the name of the desired branch, e.g. atom.branch = \"master\" to go back to the main branch. Note that every branch contains a unique copy of the whole dataset! Creating many branches can cause memory issues for large datasets. You can delete a branch either deleting the attribute, e.g. del atom.branch , or using the delete method, e.g. atom.branch.delete() . A branch can only be deleted if no models were trained on its dataset. Use atom.branch.status() to print a list of the transformers and models in the branch. See the Imbalanced datasets or Feature engineering examples for branching use cases. Warning Always create a new branch if you want to change the dataset after fitting a model! Not doing so can cause unexpected model behaviour.","title":"Branches"},{"location":"user_guide/#data-transformations","text":"Performing data transformations is a common requirement of many datasets before they are ready to be ingested by a model. ATOM provides various classes to apply data cleaning and feature engineering transformations to the data. This tooling should be able to help you apply most of the typically needed transformations to get the data ready for modelling. For further fine-tuning, it is also possible to pre-process the data using custom transformers. They can be added to the pipeline using atom's add method. Remember that all transformations are only applied to the dataset in the current branch.","title":"Data transformations"},{"location":"user_guide/#automl","text":"Automated machine learning (AutoML) automates the selection, composition and parameterization of machine learning pipelines. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms. ATOM uses the TPOT package for AutoML optimization. TPOT uses a genetic algorithm to intelligently explore thousands of possible pipelines in order to find the best one for your data. Such an algorithm can be started through the automl method. The resulting data transformers and final estimator are merged with atom's pipeline (check the pipeline and models attributes after the method finishes running). Warning AutoML algorithms aren't intended to run for only a few minutes. If left to its default parameters, the method can take a very long time to finish!","title":"AutoML"},{"location":"user_guide/#data-cleaning","text":"More often than not, you need to do some data cleaning before fitting your dataset to a model. Usually, this involves importing different libraries and writing many lines of code. Since ATOM is all about fast exploration and experimentation, it provides various data cleaning classes to apply the most common transformations fast and easy. Note All of atom's data cleaning methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.scale(verbose=2) . Note Like the add method, the data cleaning methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.scale(columns=[0, 1]) .","title":"Data cleaning"},{"location":"user_guide/#scaling-the-feature-set","text":"Standardization of a dataset is a common requirement for many machine learning estimators; they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with zero mean and unit variance). The Scaler class let you quickly scale atom's dataset using one of sklearn's scalers. It can be accessed from atom through the scale method.","title":"Scaling the feature set"},{"location":"user_guide/#standard-data-cleaning","text":"There are many data cleaning steps that are useful to perform on any dataset before modelling. These are general rules that apply almost on every use-case and every task. The Cleaner class is a convenient tool to apply such steps. It can be accessed from atom through the clean method. Use the class' parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column.","title":"Standard data cleaning"},{"location":"user_guide/#imputing-missing-values","text":"For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. Such datasets however are incompatible with ATOM's models which assume that all values in an array are numerical, and that all have and hold meaning. The Imputer class handles missing values in the dataset by either dropping or imputing the value. It can be accessed from atom through the impute method. Tip Use atom's missing attribute to check the amount of missing values per feature.","title":"Imputing missing values"},{"location":"user_guide/#encoding-categorical-features","text":"Many datasets will contain categorical features. Their variables are typically stored as text values which represent various traits. Some examples include color (\u201cRed\u201d, \u201cYellow\u201d, \u201cBlue\u201d), size (\u201cSmall\u201d, \u201cMedium\u201d, \u201cLarge\u201d) or geographic designations (city or country). Regardless of what the value is used for, the challenge is determining how to use this data in the analysis. ATOM's models don't support direct manipulation of this kind of data. Use the Encoder class to encode categorical features to numerical values. It can be accessed from atom through the encode method. Tip Use atom's categorical attribute for a list of the categorical features in the dataset.","title":"Encoding categorical features"},{"location":"user_guide/#handling-outliers","text":"When modeling, it is important to clean the data sample to ensure that the observations best represent the problem. Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers. Often, machine learning modeling and model skill in general can be improved by understanding and even removing these outlier samples. The Pruner class offers 5 different strategies to detect outliers (described hereunder). It can be accessed from atom through the prune method. Tip Use atom's outliers attribute to check the number of outliers per column. z-score The z-score of a value in the dataset is defined as the number of standard deviations by which the value is above or below the mean of the column. Values above or below a certain threshold (specified with the parameter max_sigma ) are considered outliers. Note that, contrary to the rest of the strategies, this approach selects outlier values, not outlier samples! Because of this, it is possible to replace the outlier value instead of simply dropping the sample. Isolation Forest Uses a tree-based anomaly detection algorithm. It is based on modeling the normal data in such a way as to isolate anomalies that are both few and different in the feature space. Read more in sklearn's documentation . Elliptic Envelope If the input variables have a Gaussian distribution, then simple statistical methods can be used to detect outliers. For example, if the dataset has two input variables and both are Gaussian, then the feature space forms a multi-dimensional Gaussian and knowledge of this distribution can be used to identify values far from the distribution. This approach can be generalized by defining a hypersphere (ellipsoid) that covers the normal data, and data that falls outside this shape is considered an outlier. Read more in sklearn's documentation . Local Outlier Factor A simple approach to identifying outliers is to locate those examples that are far from the other examples in the feature space. This can work well for feature spaces with low dimensionality (few features) but becomes less reliable as the number of features is increased. This is referred to as the curse of dimensionality. The local outlier factor is a technique that attempts to harness the idea of nearest neighbors for outlier detection. Each example is assigned a scoring of how isolated or how likely it is to be outliers based on the size of its local neighborhood. Those examples with the largest score are more likely to be outliers. Read more in sklearn's documentation . One-class SVM The support vector machine algorithm developed initially for binary classification can be used for one-class classification. When modeling one class, the algorithm captures the density of the majority class and classifies examples on the extremes of the density function as outliers. This modification of SVM is referred to as One-Class SVM. Read more in sklearn's documentation . DBSCAN The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. Samples that lie outside any cluster are considered outliers. Read more in sklearn's documentation . OPTICS The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, and a spot within the cluster ordering. These two attributes are assigned when the model is fitted, and are used to determine cluster membership. Read more in sklearn's documentation .","title":"Handling outliers"},{"location":"user_guide/#balancing-the-data","text":"One of the common issues found in datasets that are used for classification is imbalanced classes. Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of the transactions are non-fraud, and a very few cases are fraud. This leaves us with a very unbalanced ratio of fraud vs non-fraud cases. The Balancer class can oversample the minority class or undersample the majority class using any of the transformers implemented in imblearn . It can be accessed from atom through the balance method.","title":"Balancing the data"},{"location":"user_guide/#feature-engineering","text":"\"Applied machine learning\" is basically feature engineering. ~ Andrew Ng. Feature engineering is the process of creating new features from the existing ones, in order to capture relationships with the target column that the first set of features didn't had on their own. This process is very important to improve the performance of machine learning algorithms. Although feature engineering works best when the data scientist applies use-case specific transformations, there are ways to do this in an automated manner, without prior domain knowledge. One of the problems of creating new features without human expert intervention, is that many of the newly created features can be useless, i.e. they do not help the algorithm to make better predictions. Even worse, having useless features can drop your performance. To avoid this, we perform feature selection, a process in which we select the relevant features in the dataset. See the Feature engineering example. Note All of atom's feature engineering methods automatically adopt the relevant transformer attributes ( n_jobs , verbose , logger , random_state ) from atom. A different choice can be added as parameter to the method call, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, n_jobs=4) . Note Like the add method, the feature engineering methods accept the columns parameter to only transform a subset of the dataset's features, e.g. atom.feature_selection(\"SFM\", solver=\"LGB\", n_features=10, columns=slice(5, 15)) .","title":"Feature engineering"},{"location":"user_guide/#generating-new-features","text":"The FeatureGenerator class creates new non-linear features based on the original feature set. It can be accessed from atom through the feature_generation method. You can choose between two strategies: Deep Feature Synthesis and Genetic Feature Generation. Deep Feature Synthesis Deep feature synthesis (DFS) applies the selected operators on the features in the dataset. For example, if the operator is \"log\", it will create the new feature LOG(old_feature) and if the operator is \"mul\", it will create the new feature old_feature_1 x old_feature_2 . The operators can be chosen through the operators parameter. Available options are: add: Sum two features together. sub: Subtract two features from each other. mul: Multiply two features with each other. div: Divide two features with each other. srqt: Take the square root of a feature. log: Take the logarithm of a feature. sin: Calculate the sine of a feature. cos: Calculate the cosine of a feature. tan: Calculate the tangent of a feature. ATOM's implementation of DFS uses the featuretools package. Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Genetic Feature Generation Genetic feature generation (GFG) uses genetic programming , a branch of evolutionary programming, to determine which features are successful and create new ones based on those. Where DFS can be seen as some kind of \"brute force\" for feature engineering, GFG tries to improve its features with every generation of the algorithm. GFG uses the same operators as DFS, but instead of only applying the transformations once, it evolves them further, creating complicated non-linear combinations of features with many transformations. The new features are given the name Feature N for the N-th feature. You can access the genetic feature's fitness and description (how they are calculated) through the genetic_features attribute. ATOM uses the SymbolicTransformer class from the gplearn package for the genetic algorithm. Read more about this implementation here . Warning GFG can be slow for very large populations!","title":"Generating new features"},{"location":"user_guide/#selecting-useful-features","text":"The FeatureSelector class provides tooling to select the relevant features from a dataset. It can be accessed from atom through the feature_selection method. The following strategies are implemented: univariate, PCA, SFM, RFE and RFECV. Univariate Univariate feature selection works by selecting the best features based on univariate statistical F-test. The test is provided via the solver parameter. It takes any function taking two arrays (X, y), and returning arrays (scores, p-values). Read more in sklearn's documentation . Principal Components Analysis Applying PCA will reduce the dimensionality of the dataset by maximizing the variance of each dimension. The new features are called Component 1, Component 2, etc... The data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Read more in sklearn's documentation . Selection from model SFM uses an estimator with feature_importances_ or coef_ attributes to select the best features in a dataset based on importance weights. The estimator is provided through the solver parameter and can be already fitted. ATOM allows you to use one its predefined models , e.g. solver=\"RF\" . If you didn't call the FeatureSelector through atom, don't forget to indicate the estimator's task adding _class or _reg after the name, e.g. RF_class to use a random forest classifier. Read more in sklearn's documentation . Recursive feature elimination Select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached. Note that, since RFE needs to fit the model again every iteration, this method can be fairly slow. RFECV applies the same algorithm as RFE but uses a cross-validated metric (under the scoring parameter, see RFECV ) to assess every step's performance. Also, where RFE returns the number of features selected by n_features , RFECV returns the number of features that achieved the optimal score on the specified metric. Note that this is not always equal to the amount specified by n_features . Read more in sklearn's documentation . Removing features with low variance Variance is the expectation of the squared deviation of a random variable from its mean. Features with low variance have many values repeated, which means the model will not learn much from them. FeatureSelector removes all features where the same value is repeated in at least max_frac_repeated fraction of the rows. The default option is to remove a feature if all values in it are the same. Read more in sklearn's documentation . Removing features with multi-collinearity Two features that are highly correlated are redundant, i.e. two will not contribute more to the model than only one of them. FeatureSelector will drop a feature that has a Pearson correlation coefficient larger than max_correlation with another feature. A correlation of 1 means the two columns are equal. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE and RFECV strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs.","title":"Selecting useful features"},{"location":"user_guide/#models","text":"","title":"Models"},{"location":"user_guide/#predefined-models","text":"ATOM provides 31 models for classification and regression tasks that can be used to fit the data in the pipeline. After fitting, every model class is attached to the trainer as an attribute. We refer to these \"subclasses\" as models (see the nomenclature ). The classes contain a variety of attributes and methods to help you understand how the underlying estimator performed. They can be accessed using their acronyms, e.g. atom.LGB to access the LightGBM's model. The available models and their corresponding acronyms are: \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Classification/Regression \"Lasso\" for Lasso Regression \"EN\" for Elastic Net \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost \"LGB\" for LightGBM \"CatB\" for CatBoost \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron Tip The acronyms are case insensitive. You can also use lowercase to call the models, e.g. atom.lgb . Warning The models should not be initialized by the user! Only use them through the trainers.","title":"Predefined models"},{"location":"user_guide/#custom-models","text":"It is also possible to use your own models in ATOM's pipeline. For example, imagine we want to use sklearn's Lars estimator (note that is not included in ATOM's predefined models ). There are two ways to achieve this: Using ATOMModel (recommended). With this approach you can pass the required model characteristics to the pipeline. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel model = ATOMModel(models=Lars, fullname=\"Lars Regression\", needs_scaling=True, type=\"linear\") atom = ATOMRegressor(X, y) atom.run(model) Using the estimator's class or an instance of the class. This approach will also call ATOMModel under the hood, but it will leave its parameters to their default values. from sklearn.linear_model import Lars from atom import ATOMRegressor, ATOMModel atom = ATOMRegressor(X, y) atom.run(models=Lars) Additional things to take into account: Custom models are not restricted to sklearn estimators, but they should follow sklearn's API , i.e. have a fit and predict method. Parameter customization (for the initializer) is only possible for custom models which provide an estimator's class or an instance that has a set_params() method, i.e. it's a child class of BaseEstimator . Hyperparameter optimization for custom models is ignored unless appropriate dimensions are provided through bo_params . If the estimator has a n_jobs and/or random_state parameter that is left to its default value, it will automatically adopt the values from the trainer it's called from.","title":"Custom models"},{"location":"user_guide/#deep-learning","text":"Deep learning models can be used through ATOM's custom models as long as they follow sklearn's API . For example, models implemented with the Keras package should use the sklearn wrappers KerasClassifier or kerasRegressor . Many deep learning models, for example in computer vision and natural language processing, use datasets with more than 2 dimensions, e.g. image data can have shape (n_samples, length, width, rgb). These data structures are not intended to store in a two-dimensional pandas dataframe. Since ATOM requires a dataframe as instance for the dataset, multidimensional data sets are stored in a single column called \"Features\" where every row contains one (multidimensional) sample. example. Note that, because of this, the data cleaning , feature engineering and some of the plotting methods are unavailable for deep learning datasets. See in this example how to use ATOM to train and validate a Convolutional Neural Network implemented with Keras.","title":"Deep learning"},{"location":"user_guide/#training","text":"The training phase is where the models are fitted and evaluated. After this, the models are attached to the trainer and you can use the plotting and predicting methods. The pipeline applies the following steps iteratively for all models: The optimal hyperparameters are selected. The model is trained on the training set and evaluated on the test set. The bagging algorithm is applied. There are three approaches to run the training. Direct training: DirectClassifier DirectRegressor Training via successive halving : SuccessiveHalvingClassifier SuccessiveHavingRegressor Training via train sizing : TrainSizingClassifier TrainSizingRegressor The direct fashion repeats the aforementioned steps only once, while the other two approaches repeats them more than once. Every approach can be directly called from atom through the run , successive_halving and train_sizing methods respectively. Models are called through their acronyms , e.g. atom.run(models=\"RF\") will train a Random Forest . If you want to run the same model multiple times, add a tag after the acronym to differentiate them. atom.run(models=[\"RF1\", \"RF2\"], est_params={\"RF1\": {\"n_estimators\": 100}, \"RF2\": {\"n_estimators\": 200}}) For example, this pipeline will fit two Random Forest models, one with 100 and the other with 200 decision trees. The models can be accessed through atom.rf1 and atom.rf2 . Use tagged models to test how the same model performs when fitted with different parameters or on different data sets. See the Imbalanced datasets example. Additional things to take into account: Models that need feature scaling will do so automatically before training if they are not already scaled. If an exception is encountered while fitting an estimator, the pipeline will automatically jump to the next model. The errors are stored in the errors attribute. Note that in case a model is skipped, there will be no model subclass for that estimator. When showing the final results, a ! indicates the highest score and a ~ indicates that the model is possibly overfitting (training set has a score at least 20% higher than the test set). The winning model (the one with the highest mean_bagging or metric_test ) can be accessed through the winner attribute.","title":"Training"},{"location":"user_guide/#metric","text":"ATOM uses sklearn's SCORERS for model selection and evaluation. A scorer consists of a metric function and some parameters that define the scorer's properties such as it's a score or loss function or if the function needs probability estimates or rounded predictions (see make_scorer ). ATOM lets you define the scorer for the pipeline in three ways: The metric parameter is one of sklearn's predefined scorers (as string). The metric parameter is a score (or loss) function with signature metric(y, y_pred, **kwargs). In this case, use the greater_is_better , needs_proba and needs_threshold parameters to specify the scorer's properties. The metric parameter is a scorer object. Note that all scorers follow the convention that higher return values are better than lower return values. Thus, metrics which measure the distance between the model and the data (i.e. loss functions), like max_error or mean_squared_error , will return the negated value of the metric. Custom scorer acronyms Since some of sklearn's scorers have quite long names and ATOM is all about lazy fast experimentation, the package provides acronyms for some of the most commonly used ones. These acronyms are case-insensitive and can be used in the metric parameter instead of the scorer's full name, e.g. atom.run(\"LR\", metric=\"BA\") will use balanced_accuracy . The available acronyms are: \"AP\" for \"average_precision\" \"BA\" for \"balanced_accuracy\" \"AUC\" for \"roc_auc\" \"LogLoss\" for \"neg_log_loss\" \"EV\" for \"explained_variance\" \"ME\" for \"max_error\" \"MAE\" for \"neg_mean_absolute_error\" \"MSE\" for \"neg_mean_squared_error\" \"RMSE\" for \"neg_root_mean_squared_error\" \"MSLE\" for \"neg_mean_squared_log_error\" \"MEDAE\" for \"neg_median_absolute_error\" \"POISSON\" for \"neg_mean_poisson_deviance\" \"GAMMA\" for \"neg_mean_gamma_deviance\" Multi-metric runs Sometimes it is useful to measure the performance of the models in more than one way. ATOM lets you run the pipeline with multiple metrics at the same time. To do so, provide the metric parameter with a list of desired metrics, e.g. atom.run(\"LDA\", metric=[\"r2\", \"mse\"]) . If you provide metric functions, don't forget to also provide lists to the greater_is_better , needs_proba and needs_threshold parameters, where the n-th value in the list corresponds to the n-th function. If you leave them as a single value, that value will apply to every provided metric. When fitting multi-metric runs, the resulting scores will return a list of metrics. For example, if you provided three metrics to the pipeline, atom.knn.metric_bo could return [0.8734, 0.6672, 0.9001]. It is also important to note that only the first metric of a multi-metric run is used to evaluate every step of the bayesian optimization and to select the winning model. Tip Some plots let you choose which of the metrics to show using the metric parameter.","title":"Metric"},{"location":"user_guide/#parameter-customization","text":"By default, the parameters every estimator uses are the same default parameters they get from their respective packages. To select different ones, use est_params . There are two ways to add custom parameters to the models: adding them directly to the dictionary as key-value pairs or through multiple dicts with the model names as keys. Adding the parameters directly to est_params will share them across all models in the pipeline. In this example, both the XGBoost and the LightGBM model will use n_estimators=200. Make sure all the models do have the specified parameters or an exception will be raised! atom.run([\"XGB\", \"LGB\"], est_params={\"n_estimators\": 200}) To specify parameters per model, use the model name as key and a dict of the parameters as value. In this example, the XGBoost model will use n_estimators=200 and the Multi-layer Perceptron will use one hidden layer with 75 neurons. atom.run([\"XGB\", \"MLP\"], est_params={\"XGB\": {\"n_estimators\": 200}, \"MLP\": {\"hidden_layer_sizes\": (75,)}}) Some estimators allow you to pass extra parameters to the fit method (besides X and y). This can be done adding _fit at the end of the parameter. For example, to change XGBoost's verbosity, we can run: atom.run(\"XGB\", est_params={\"verbose_fit\": True}) Note If a parameter is specified through est_params , it is ignored by the bayesian optimization!","title":"Parameter customization"},{"location":"user_guide/#hyperparameter-optimization","text":"In order to achieve maximum performance, we need to tune an estimator's hyperparameters before training it. ATOM provides hyperparameter tuning using a bayesian optimization (BO) approach implemented by skopt . The BO is optimized on the first metric provided with the metric parameter. Each step is either computed by cross-validation on the complete training set or by randomly splitting the training set every iteration into a (sub) training set and a validation set. This process can create some data leakage but ensures maximal use of the provided data. The test set, however, does not contain any leakage and is used to determine the final score of every model. Note that, if the dataset is relatively small, the BO's best score can consistently be lower than the final score on the test set (despite the leakage) due to the considerable fewer instances on which it is trained. There are many possibilities to tune the BO to your liking. Use n_calls and n_initial_points to determine the number of iterations that are performed randomly at the start (exploration) and the number of iterations spent optimizing (exploitation). If n_calls is equal to n_initial_points , every iteration of the BO will select its hyperparameters randomly. This means the algorithm is technically performing a random search . Note The n_calls parameter includes the iterations in n_initial_points , i.e. calling atom.run(\"LR\", n_calls=20, n_intial_points=10) will run 20 iterations of which the first 10 are random. Note If n_initial_points=1 , the first trial is equal to the estimator's default parameters. Other settings can be changed through the bo_params parameter, a dictionary where every key-value combination can be used to further customize the BO. By default, the hyperparameters and corresponding dimensions per model are predefined by ATOM. Use the dimensions key to use custom ones. Just like with est_params , you can share the same dimensions across models or use a dictionary with the model names as keys to specify the dimensions for every individual model. Note that the provided search space dimensions must be compliant with skopt's API. atom.run(\"LR\", n_calls=10, bo_params={\"dimensions\": [Integer(100, 1000, name=\"max_iter\")]}) The majority of skopt's callbacks to stop the optimizer early can be accessed through bo_params . You can include other callbacks using the callbacks key. atom.run(\"LR\", n_calls=10, bo_params={\"max_time\": 1000, \"callbacks\": custom_callback()}) You can also include other parameters for the optimizer as key-value pairs. atom.run(\"LR\", n_calls=10, bo_params={\"acq_func\": \"EI\"})","title":"Hyperparameter optimization"},{"location":"user_guide/#bagging","text":"After fitting the estimator, you can asses the robustness of the model using bootstrap aggregating (bagging). This technique creates several new data sets selecting random samples from the training set (with replacement) and evaluates them on the test set. This way we get a distribution of the performance of the model. The number of sets can be chosen through the bagging parameter. Tip Use the plot_results method to plot the bagging scores in a boxplot.","title":"Bagging"},{"location":"user_guide/#early-stopping","text":"XGBoost , LightGBM and CatBoost allow in-training evaluation. This means that the estimator is evaluated after every round of the training, and that the training is stopped early if it didn't improve in the last early_stopping rounds. This can save the pipeline much time that would otherwise be wasted on an estimator that is unlikely to improve further. Note that this technique is applied both during the BO and at the final fit on the complete training set. There are two ways to apply early stopping on these models: Through the early_stopping key in bo_params . This approach applies early stopping to all models in the pipeline and allows the input of a fraction of the total number of rounds. Filling the early_stopping_rounds parameter directly in est_params . Don't forget to add _fit to the parameter to call it from the fit method. After fitting, the model will get the evals attribute, a dictionary of the train and test performances per round (also if early stopping wasn't applied). Click here for an example using early stopping. Tip Use the plot_evals method to plot the in-training evaluation on the train and test set.","title":"Early stopping"},{"location":"user_guide/#successive-halving","text":"Successive halving is a bandit-based algorithm that fits N models to 1/N of the data. The best half are selected to go to the next iteration where the process is repeated. This continues until only one model remains, which is fitted on the complete dataset. Beware that a model's performance can depend greatly on the amount of data on which it is trained. For this reason, we recommend only to use this technique with similar models, e.g. only using tree-based models. Use successive halving through the SuccessiveHalvingClassifier / SuccessiveHalvingRegressor classes or from atom via the successive_halving method. Consecutive runs of the same model are saved with the model's acronym followed by the number of models in the run. For example, a Random Forest in a run with 4 models would become model RF4 . Click here for a successive halving example. Tip Use the plot_successive_halving method to see every model's performance per iteration of the successive halving.","title":"Successive halving"},{"location":"user_guide/#train-sizing","text":"When training models, there is usually a trade-off between model performance and computation time that is regulated by the number of samples in the training set. Train sizing can be used to create insights in this trade-off and help determine the optimal size of the training set, fitting the models multiple times, ever increasing the number of samples in the training set. Use train sizing through the TrainSizingClassifier / TrainSizingRegressor classes or from atom via the train_sizing method. The number of iterations and the number of samples per training can be specified with the train_sizes parameter. Consecutive runs of the same model are saved with the model's acronym followed by the fraction of rows in the training set (the . is removed from the fraction!). For example, a Random Forest in a run with 80% of the training samples would become model RF08 . Click here for a train sizing example. Tip Use the plot_learning_curve method to see the model's performance per size of the training set.","title":"Train sizing"},{"location":"user_guide/#voting","text":"The idea behind Voting is to combine the predictions of conceptually different models to make new predictions. Such a technique can be useful for a set of equally well performing models in order to balance out their individual weaknesses. Read more in sklearn's documentation . A Voting model is created from a trainer through the voting method. The Voting model is added automatically to the list of models in the pipeline, under the Vote acronym. Although similar, this model is different from the VotingClassifier and VotingRegressor estimators from sklearn. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. Plots that require an estimator object will raise an exception. The Voting class has the same prediction attributes and prediction methods as other models. The predict_proba , predict_log_proba , decision_function and score methods return the average predictions (soft voting) over the models in the instance. Note that these methods will raise an exception if not all estimators in the Voting instance have the specified method. The predict method returns the majority vote (hard voting). The scoring method also returns the average scoring for the selected metric over the models. Click here for a voting example. Warning Although it is possible to include models from different branches in the same Voting instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods.","title":"Voting"},{"location":"user_guide/#stacking","text":"Stacking is a method for combining estimators to reduce their biases. More precisely, the predictions of each individual estimator are stacked together and used as input to a final estimator to compute the prediction. Read more in sklearn's documentation . A Stacking model is created from a trainer through the stacking method. The Stacking model is added automatically to the list of models in the pipeline, under the Stack acronym. Remember that the model is added to the plots if the models parameter is not specified. Plots that require a data set will use the one in the current branch. The prediction methods, the scoring method and the plot methods that require an estimator object will use the Voting's final estimator, under the estimator attribute. Click here for a stacking example. Warning Although it is possible to include models from different branches in the same Stacking instance, this is highly discouraged. Data sets from different branches with unequal shape can result in unexpected errors for plots and prediction methods.","title":"Stacking"},{"location":"user_guide/#predicting","text":"After running a successful pipeline, it is possible you would like to apply all used transformations onto new data, or make predictions using one of the trained models. Just like a sklearn estimator, you can call the prediction methods from a fitted trainer, e.g. atom.predict(X) . Calling the method without specifying a model will use the winning model in the pipeline (under attribute winner ). To use a different model, simply call the method from a model, e.g. atom.KNN.predict(X) . All prediction methods transform the provided data through the data cleaning and feature engineering transformers before making the predictions. By default, this excludes outlier handling and balancing the dataset since these steps should only be applied on the training set. Use the method's kwargs to select which transformations to use in every call. The available prediction methods are a selection of the most common methods for estimators in sklearn's API: transform Transform new data through all transformers in a branch. predict Transform new data through all transformers in a branch and return class predictions. predict_proba Transform new data through all transformers in a branch and return class predictions. predict_log_proba Transform new data through all transformers in a branch and return class log-probabilities. decision_function Transform new data through all transformers in a branch and return predicted confidence scores. score Transform new data through all transformers in a branch and return the model's score. Except for transform, the prediction methods can be calculated on the train and test set. You can access them through the model's prediction attributes, e.g. atom.mnb.predict_train or atom.mnb.predict_test . Keep in mind that the results are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Note Many of the plots use the prediction attributes. This can considerably increase the size of the class for large datasets. Use the reset_predictions method if you need to free some memory!","title":"Predicting"},{"location":"user_guide/#plots","text":"After fitting the models to the data, it's time to analyze the results. ATOM provides many plotting methods to compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib , seaborn and shap for plotting. The plot methods can be called from a training directly, e.g. atom.plot_roc() , or from one of the models, e.g. atom.LGB.plot_roc() . If called from training , it will make the plot for all models in the pipeline. This can be useful to compare the results of multiple models. If called from a model, it will make the plot for only that model. Use this option if you want information just for that specific model or to make a plot less crowded.","title":"Plots"},{"location":"user_guide/#parameters","text":"Apart from the plot-specific parameters they may have, all plots have four parameters in common: The title parameter allows you to add a custom title to the plot. The figsize parameter adjust the plot's size. The filename parameter is used to save the plot. The display parameter determines whether the plot is rendered.","title":"Parameters"},{"location":"user_guide/#aesthetics","text":"The plot aesthetics can be customized using the plot attributes, e.g. atom.style = \"white\" . These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. ATOM's default values are: style: \"darkgrid\" palette: \"GnBu_r_d\" title_fontsize: 20 label_fontsize: 16 tick_fontsize: 12 Use the reset_aesthetics method to reset all the aesthetics to their default value.","title":"Aesthetics"},{"location":"user_guide/#canvas","text":"Sometimes it might be desirable to draw multiple plots side by side in order to be able to compare them easier. Use the atom's canvas method for this. The canvas method is a @contextmanager , i.e. it is used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot could become too messy. atom = ATOMClassifier(X, y) atom.run([\"xgb\", \"lgb\"], n_calls=0) with atom.canvas(2, 2, title=\"XGBoost vs LightGBM\", filename=\"canvas\"): atom.xgb.plot_roc(dataset=\"both\", title=\"ROC - XGBoost\") atom.lgb.plot_roc(dataset=\"both\", title=\"ROC - LightGBM\") atom.xgb.plot_prc(dataset=\"both\", title=\"PRC - XGBoost\") atom.lgb.plot_prc(dataset=\"both\", title=\"PRC - LightGBM\")","title":"Canvas"},{"location":"user_guide/#shap","text":"The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot , beeswarm_plot , decision_plot , force_plot , heatmap_plot , scatter_plot and waterfall_plot . Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot() . Info You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot .","title":"SHAP"},{"location":"user_guide/#available-plots","text":"A list of available plots can be find hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand. plot_correlation Plot the data's correlation matrix. plot_scatter_matrix Plot the data's scatter matrix. plot_qq Plot a quantile-quantile plot. plot_distribution Plot column distributions. plot_pipeline Plot a diagram of every estimator in atom's pipeline. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per components. plot_rfecv Plot the RFECV results. plot_successive_halving Plot of the models\" scores per iteration of the successive halving. plot_learning_curve Plot the model's learning curve. plot_results Plot a boxplot of the bagging's results. plot_bo Plot the bayesian optimization scoring. plot_evals Plot evaluation curves for the train and test set. plot_roc Plot the Receiver Operating Characteristics curve. plot_prc Plot the precision-recall curve. plot_permutation_importance Plot the feature permutation importance of models. plot_feature_importance Plot a tree-based model's feature importance. plot_partial_dependence Plot the partial dependence of features. plot_errors Plot a model's prediction errors. plot_residuals Plot a model's residuals. plot_confusion_matrix Plot a model's confusion matrix. plot_threshold Plot metric performances against threshold values. plot_probabilities Plot the probability distribution of the classes in the target column. plot_calibration Plot the calibration curve for a binary classifier. plot_gains Plot the cumulative gains curve. plot_lift Plot the lift curve. bar_plot Plot SHAP's bar plot. beeswarm_plot Plot SHAP's beeswarm plot. decision_plot Plot SHAP's decision plot. force_plot Plot SHAP's force plot. heatmap_plot Plot SHAP's heatmap plot. scatter_plot Plot SHAP's scatter plot. waterfall_plot Plot SHAP's waterfall plot.","title":"Available plots"},{"location":"API/ATOM/ATOMLoader/","text":"ATOMLoader function ATOMLoader (filename, data=None, transform_data=True, verbose=None) [source] Load a class instance from a pickle file. If the file is a trainer that was saved using save_data=False , you can load new data into it. For atom pickles, you can also apply all data transformations in the pipeline to the data. Parameters: filename: str Name of the pickle file to load. data: tuple of indexables or None, optional (default=None) Tuple containing the features and target data. Only use this parameter if the file is a trainer that was saved using save_data=False (see the save method). Allowed formats are: X or X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). If no y is provided, the last column is used as target. y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). transform_data: bool, optional (default=True) If False, the data is left as provided. If True, it is transformed through all the steps in the instance's pipeline. This parameter is ignored if the loaded file is not an atom pickle. verbose: int or None, optional (default=None) Verbosity level of the transformations applied on the new data. If None, use the verbosity from the loaded instance. This parameter is ignored if transform_data=False . Example from atom import ATOMClassifier, ATOMLoader # Save an atom instance to a pickle file atom = ATOMClassifier(X, y) atom.encode(strategy=\"Helmert\", max_onehot=12) atom.run(\"LR\", metric=\"AP\", n_calls=25, n_initial_points=10) atom.save(\"atom_lr\", save_data=False) # Load the class and add the transformed data to the new instance atom_2 = ATOMLoader(\"atom_lr\", data=(X, y), verbose=0)","title":"ATOMLoader"},{"location":"API/ATOM/ATOMLoader/#atomloader","text":"function ATOMLoader (filename, data=None, transform_data=True, verbose=None) [source] Load a class instance from a pickle file. If the file is a trainer that was saved using save_data=False , you can load new data into it. For atom pickles, you can also apply all data transformations in the pipeline to the data. Parameters: filename: str Name of the pickle file to load. data: tuple of indexables or None, optional (default=None) Tuple containing the features and target data. Only use this parameter if the file is a trainer that was saved using save_data=False (see the save method). Allowed formats are: X or X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). If no y is provided, the last column is used as target. y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). transform_data: bool, optional (default=True) If False, the data is left as provided. If True, it is transformed through all the steps in the instance's pipeline. This parameter is ignored if the loaded file is not an atom pickle. verbose: int or None, optional (default=None) Verbosity level of the transformations applied on the new data. If None, use the verbosity from the loaded instance. This parameter is ignored if transform_data=False .","title":"ATOMLoader"},{"location":"API/ATOM/ATOMLoader/#example","text":"from atom import ATOMClassifier, ATOMLoader # Save an atom instance to a pickle file atom = ATOMClassifier(X, y) atom.encode(strategy=\"Helmert\", max_onehot=12) atom.run(\"LR\", metric=\"AP\", n_calls=25, n_initial_points=10) atom.save(\"atom_lr\", save_data=False) # Load the class and add the transformed data to the new instance atom_2 = ATOMLoader(\"atom_lr\", data=(X, y), verbose=0)","title":"Example"},{"location":"API/ATOM/atomclassifier/","text":"ATOMClassifier class atom.api. ATOMClassifier (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMClassifier is ATOM's wrapper for binary and multiclass classification tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMClassifier instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Magic methods The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset. Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. mapping: dict Target classes mapped to their respective encoded integer. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. classes: pd.DataFrame Distribution of classes per data set. n_classes: int Number of classes in the target column. Utility attributes Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Utility methods The ATOMClassifier class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. get_class_weight Return class weights for a balanced dataset. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's classifier. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Data cleaning ATOMClassifier provides data cleaning methods to scale your features and handle missing values, categorical columns, outliers and unbalanced datasets. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. balance Balance the target classes in the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. method balance (strategy=\"ADASYN\", **kwargs) [source] Balance the number of samples per target class in the target column. Only the training set is balanced in order to maintain the original distribution of target classes in the test set. See Balancer for a description of the parameters. Feature engineering To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified). Training The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectClassifier instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingClassifier instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingClassifier instance. Example from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y=True) # Initialize class atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", max_sigma=2) atom.balance(strategy=\"smote\", sampling_strategy=0.7) # Fit the models to the data atom.run( models=[\"QDA\", \"CatB\"], metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_roc(figsize=(9, 6), filename=\"roc.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"LR\", metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"ATOMClassifier"},{"location":"API/ATOM/atomclassifier/#atomclassifier","text":"class atom.api. ATOMClassifier (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMClassifier is ATOM's wrapper for binary and multiclass classification tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMClassifier instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"ATOMClassifier"},{"location":"API/ATOM/atomclassifier/#magic-methods","text":"The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset.","title":"Magic methods"},{"location":"API/ATOM/atomclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/ATOM/atomclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. mapping: dict Target classes mapped to their respective encoded integer. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. classes: pd.DataFrame Distribution of classes per data set. n_classes: int Number of classes in the target column.","title":"Data attributes"},{"location":"API/ATOM/atomclassifier/#utility-attributes","text":"Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/ATOM/atomclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/ATOM/atomclassifier/#utility-methods","text":"The ATOMClassifier class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. get_class_weight Return class weights for a balanced dataset. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's classifier. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Utility methods"},{"location":"API/ATOM/atomclassifier/#data-cleaning","text":"ATOMClassifier provides data cleaning methods to scale your features and handle missing values, categorical columns, outliers and unbalanced datasets. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. balance Balance the target classes in the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. method balance (strategy=\"ADASYN\", **kwargs) [source] Balance the number of samples per target class in the target column. Only the training set is balanced in order to maintain the original distribution of target classes in the test set. See Balancer for a description of the parameters.","title":"Data cleaning"},{"location":"API/ATOM/atomclassifier/#feature-engineering","text":"To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified).","title":"Feature engineering"},{"location":"API/ATOM/atomclassifier/#training","text":"The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectClassifier instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingClassifier instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingClassifier instance.","title":"Training"},{"location":"API/ATOM/atomclassifier/#example","text":"from sklearn.datasets import load_breast_cancer from atom import ATOMClassifier X, y = load_breast_cancer(return_X_y=True) # Initialize class atom = ATOMClassifier(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", max_sigma=2) atom.balance(strategy=\"smote\", sampling_strategy=0.7) # Fit the models to the data atom.run( models=[\"QDA\", \"CatB\"], metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_roc(figsize=(9, 6), filename=\"roc.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"LR\", metric=\"precision\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"Example"},{"location":"API/ATOM/atommodel/","text":"ATOMModel function ATOMModel (estimator, acronym=None, fullname=None, needs_scaling=False) [source] Convert an estimator to a model that can be ingested by atom. Parameters: estimator: class Model's estimator. Can be a class or an instance. acronym: str or None, optional (default=None) Model's acronym. Used to call the model from the trainer. If None, the capital letters in the estimator's __name__ are used (if 2 or more, else it uses the entire name). fullname: str or None, optional (default=None) Full model's name. If None, the estimator's __name__ is used. needs_scaling: bool, optional (default=False) Whether the model needs scaled features. Can not be True for deep learning datasets. Example from atom import ATOMRegressor, ATOMModel from sklearn.linear_model import HuberRegressor model = ATOMModel(HuberRegressor, name=\"hub\", fullname=\"Huber\", needs_scaling=True) atom = ATOMRegressor(X, y) atom.run(model) atom.hub.predict(X_new)","title":"ATOMModel"},{"location":"API/ATOM/atommodel/#atommodel","text":"function ATOMModel (estimator, acronym=None, fullname=None, needs_scaling=False) [source] Convert an estimator to a model that can be ingested by atom. Parameters: estimator: class Model's estimator. Can be a class or an instance. acronym: str or None, optional (default=None) Model's acronym. Used to call the model from the trainer. If None, the capital letters in the estimator's __name__ are used (if 2 or more, else it uses the entire name). fullname: str or None, optional (default=None) Full model's name. If None, the estimator's __name__ is used. needs_scaling: bool, optional (default=False) Whether the model needs scaled features. Can not be True for deep learning datasets.","title":"ATOMModel"},{"location":"API/ATOM/atommodel/#example","text":"from atom import ATOMRegressor, ATOMModel from sklearn.linear_model import HuberRegressor model = ATOMModel(HuberRegressor, name=\"hub\", fullname=\"Huber\", needs_scaling=True) atom = ATOMRegressor(X, y) atom.run(model) atom.hub.predict(X_new)","title":"Example"},{"location":"API/ATOM/atomregressor/","text":"ATOMRegressor class atom.api. ATOMRegressor (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Magic methods The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset. Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers. Utility attributes Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Utility methods The ATOMRegressor class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's regressor. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Data cleaning ATOMRegressor provides data cleaning methods to scale your features and handle missing values, categorical columns and outliers. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters. Feature engineering To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified). Training The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectRegressor instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingRegressor instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingRegressor instance. Example from sklearn.datasets import load_boston from atom import ATOMRegressor X, y = load_boston(return_X_y=True) # Initialize class atom = ATOMRegressor(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2, include_target=True) # Fit the models to the data atom.run( models=[\"OLS\", \"BR\", \"CatB\"], metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_errors(figsize=(9, 6), filename=\"errors.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"MLP\", metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"ATOMRegressor"},{"location":"API/ATOM/atomregressor/#atomregressor","text":"class atom.api. ATOMRegressor (*arrays, y=-1, n_rows=1, test_size=0.2, logger=None, n_jobs=1, warnings=True, verbose=0, random_state=None) [source] ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already contains the dataset on which we want to perform the analysis. Calling a method will automatically apply it on the dataset it contains. You can predict , plot and call any model from atom. Read more in the user guide . Parameters: *arrays: sequence of indexables Dataset containing features and target. Allowed formats are: X X, y train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) X, train, test: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_features, n_samples). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). y: int, str or sequence, optional (default=-1) Target column in X. Ignored if provided through arrays . If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). n_rows: int or float, optional (default=1) If < =1: Fraction of the dataset to use. If >1: Number of rows to use (only if input is X, y). test_size: int, float, optional (default=0.2) If < =1: Fraction of the dataset to include in the test set. If >1: Number of rows to include in the test set. This parameter is ignored if the train and test set are provided. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. warnings: bool or str, optional (default=True) If True: Default warning action (equal to \"default\"). If False: Suppress all warnings (equal to \"ignore\"). If str: One of the actions in python's warnings environment. Note that changing this parameter will affect the PYTHONWARNINGS environment. Note that ATOM can't manage warnings that go directly from C++ code to the stdout/stderr. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"ATOMRegressor"},{"location":"API/ATOM/atomregressor/#magic-methods","text":"The class contains some magic methods to help you access some of its elements faster. Note that methods that apply on the pipeline can return different results per branch. __repr__: Prints an overview of atom's branches, models, metric and errors. __len__: Returns the length of the pipeline. __iter__: Iterate over the pipeline's transformers. __contains__: Checks if the provided item is a column in the dataset. __getitem__: If int, return the i-th transformer in the pipeline. If str, access a column in the dataset.","title":"Magic methods"},{"location":"API/ATOM/atomregressor/#attributes","text":"","title":"Attributes"},{"location":"API/ATOM/atomregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Changing the branch will also change the response from these attributes accordingly. Attributes: pipeline: pd.Series Series containing all transformers fitted on the data in the current branch. Use this attribute only to access the individual instances. To visualize the pipeline, use the status method from the branch or the plot_pipeline method. feature_importance: list Features ordered by most to least important. This attribute is created after running the feature_selection , plot_permutation_importance or plot_feature_importance methods. dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. scaled: bool Whether the feature set is scaled. It is considered scaled when it has mean=0 and std=1, or when atom has a scaler in the pipeline. duplicates: int Number of duplicate rows in the dataset. nans: pd.Series Columns with the number of missing values in them. n_nans: int Number of samples containing missing values. numerical: list Names of the numerical features in the dataset. n_numerical: int Number of numerical features in the dataset. categorical: list Names of the categorical features in the dataset. n_categorical: int Number of categorical features in the dataset. outliers: pd.Series Columns in training set with amount of outlier values. n_outliers: int Number of samples in the training set containing outliers.","title":"Data attributes"},{"location":"API/ATOM/atomregressor/#utility-attributes","text":"Attributes: missing: list List of values that are considered \"missing\" (used by the clean and impute methods). Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/ATOM/atomregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/ATOM/atomregressor/#utility-methods","text":"The ATOMRegressor class contains a variety of methods to help you handle the data and inspect the pipeline. add Add a transformer to the current branch. apply Apply a function to the dataset. automl Use AutoML to search for an optimized pipeline. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. distribution Get statistics on a column's distribution. drop Drop columns from the dataset. export_pipeline Export atom's pipeline to a sklearn's Pipeline object. log Save information to the logger and print to stdout. report Get an extensive profile analysis of the data. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. save Save the instance to a pickle file. save_data Save data to a csv file. scoring Returns the scores of the models for a specific metric. stacking Add a Stacking instance to the models in the pipeline. stats Print out a list of basic statistics on the dataset. status Get an overview of atom's branches, models and errors. voting Add a Voting instance to the models in the pipeline. method add (transformer, columns=None, train_only=False) [source] Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's pipeline. If the transformer is a sklearn Pipeline, every transformer is merged independently with atom. Note If the transformer doesn't return a dataframe, the column naming happens as follows. If the transformer returns the same number of columns, the names are kept equal. If the number of columns change, old columns will keep their name (as long as the column is unchanged) and new columns will receive the name Feature n , where n stands for the n-th feature. This means that a transformer should only transform, add or drop columns, not combinations of these. Note If the transformer has a n_jobs and/or random_state parameter and it is left to its default value, it adopts atom's value. Parameters: transformer: estimator Transformer to add to the pipeline. Should implement a transform method. columns: int, str, slice, sequence or None, optional (default=None) Names or indices of the columns in the dataset to transform. If None, transform all columns. train_only: bool, optional (default=False) Whether to apply the transformer only on the train set or on the complete dataset. method apply (func, column) [source] Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the dataset, a new column with that name is created. The input of function is the complete dataset as pd.DataFrame. Note This approach is preferred over changing the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: func: callable Function to apply to the dataset. column: int or str Name or index of the column in the dataset to create or transform. method automl (**kwargs) [source] Uses the TPOT package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are merged with atom's pipeline. The tpot instance can be accessed through the tpot attribute. Read more in the user guide . Parameters: **kwargs Keyword arguments for tpot's regressor. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method distribution (column=0) [source] Compute the KS-statistic for various distributions against a column in the dataset. Missing values are ignored. Tip Use the plot_distribution method to plot the column's distribution. Parameters: column: int or str, optional (default=0) Index or name of the column to get the statistics from. Only numerical columns are accepted. Returns: stats: pd.DataFrame Dataframe with the statistic results. method drop (columns) [source] Drop columns from the dataset. Note This approach is preferred over dropping columns from the dataset directly through the property's @setter since the transformation is saved to atom's pipeline. Parameters: columns: int, str, slice or sequence Names or indices of the columns to drop. method export_pipeline (model=None) [source] Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a StandardScaler is added. The returned pipeline is already fitted. Parameters: model: str or None, optional (default=None) Name of the model to add as a final estimator to the pipeline. If None, no model is added. Returns: pipeline: Pipeline Pipeline in the current branch as a sklearn object. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method report (dataset=\"dataset\", n_rows=None, filename=None) [source] Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for n_rows > 10k. Parameters: dataset: str, optional (default=\"dataset\") Data set to get the report from. n_rows: int or None, optional (default=None) Number of (randomly picked) rows to process. None for all rows. filename: str or None, optional (default=None) Name to save the file with (as .html). None to not save anything. Returns: report: ProfileReport Created profile object. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method save_data (filename=None, dataset=\"dataset\") [source] Save the data in the current branch to a csv file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" for default name. dataset: str, optional (default=\"dataset\") Data set to save. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. The passed dataset is scaled if any of the models require scaled features and they are not already. method stats () [source] Print basic information about the dataset. method status () [source] Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's __repr__ but will also save it to the logger. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Utility methods"},{"location":"API/ATOM/atomregressor/#data-cleaning","text":"ATOMRegressor provides data cleaning methods to scale your features and handle missing values, categorical columns and outliers. Calling on one of them will automatically apply the method on the dataset in the pipeline. Tip Use the report method to examine the data and help you determine suitable parameters for the data cleaning methods. scale Scale the dataset. clean Applies standard data cleaning steps on the dataset. impute Handle missing values in the dataset. encode Encode categorical features. prune Prune outliers from the training set. method scale (strategy=\"standard\") [source] Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the Scaler class. method clean (prohibited_types=None, strip_categorical=True, maximum_cardinality=True, minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) [source] Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. See Cleaner for a description of the parameters. method impute (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, missing=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the missing attribute to customize what are considered \"missing values\". See Imputer for a description of the parameters. Note that since the Imputer can remove rows from both the train and test set, the size of the sets may change after the tranformation. method encode (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None) [source] Perform encoding of categorical features. The encoding type depends on the number of unique values in the column: If n_unique=2, use Label-encoding. If 2 < n_unique < = max_onehot, use OneHot-encoding. If n_unique > max_onehot, use `strategy`-encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. Categorical features are defined as all columns whose dtype.kind not in ifu . Will raise an error if it encounters missing values or unknown classes when transforming. The encoder is fitted only on the training set to avoid data leakage. See Encoder for a description of the parameters. method prune (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, **kwargs) [source] Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned in order to maintain the original distribution of samples in the test set. Ignores categorical columns. See Pruner for a description of the parameters.","title":"Data cleaning"},{"location":"API/ATOM/atomregressor/#feature-engineering","text":"To further pre-process the data, you can create new non-linear features transforming the existing ones or, if your dataset is too large, remove features using one of the provided strategies. feature_generation Create new features from combinations of existing ones. feature_selection Remove features according to the selected strategy. method feature_generation (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See FeatureGenerator for a description of the parameters. Attributes created by the class are attached to atom. method feature_selection (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. See FeatureSelector for a description of the parameters. Plotting methods and attributes created by the class are attached to atom. Note When strategy=\"univariate\" and solver=None, f_classif is used as default solver. When strategy is one of SFM, RFE, RFECV or SFS and the solver is one of ATOM's predefined models , the algorithm will automatically select the classifier (no need to add _class to the solver). When strategy is one of SFM, RFE, RFECV or SFS and solver=None, ATOM will use the winning model (if it exists) as solver. When strategy is RFECV or SFS, ATOM will use the metric in the pipeline (if it exists) as the scoring parameter (only if not specified).","title":"Feature engineering"},{"location":"API/ATOM/atomregressor/#training","text":"The training methods are where the models are fitted to the data and their performance is evaluated according to the selected metric. There are three methods to call the three different training approaches in ATOM. All relevant attributes and methods from the training classes are attached to atom for convenience. These include the errors, winner and results attributes, as well as the models , and the prediction and plotting methods. run Fit the models to the data in a direct fashion. successive_halving Fit the models to the data in a successive halving fashion. train_sizing Fit the models to the data in a train sizing fashion. method run (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a DirectRegressor instance. method successive_halving (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a SuccessiveHalvingRegressor instance. method train_sizing (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0) [source] Runs a TrainSizingRegressor instance.","title":"Training"},{"location":"API/ATOM/atomregressor/#example","text":"from sklearn.datasets import load_boston from atom import ATOMRegressor X, y = load_boston(return_X_y=True) # Initialize class atom = ATOMRegressor(X, y, logger=\"auto\", n_jobs=2, verbose=2) # Apply data cleaning methods atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2, include_target=True) # Fit the models to the data atom.run( models=[\"OLS\", \"BR\", \"CatB\"], metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Analyze the results print(f\"The winning model is: {atom.winner.name}\") print(atom.results) # Make some plots atom.plot_errors(figsize=(9, 6), filename=\"errors.png\") atom.CatB.plot_feature_importance(filename=\"catboost_feature_importance.png\") # Run an extra model atom.run( models=\"MLP\", metric=\"MSE\", n_calls=25, n_initial_points=10, bo_params={\"cv\": 1}, bagging=4, ) # Get the predictions for the best model on new data predictions = atom.predict(X_new)","title":"Example"},{"location":"API/data_cleaning/balancer/","text":"Balancer class atom.data_cleaning. Balancer (strategy=\"ADASYN\", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide . Parameters: strategy: str, optional (default=\"ADASYN\") Type of algorithm to use for oversampling or undersampling. Choose from one of the estimators available in the imbalanced-learn package. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's classes attribute for an overview of the target class distribution per data set. Attributes Attributes: : imblearn estimator Estimator instance (lowercase strategy) used to oversample or undersample the data, e.g. balancer.adasyn for the default strategy. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Methods get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Balancer Estimator instance. method transform (X, y) [source] Oversample or undersample the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.balance(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) or from atom.data_cleaning import Balancer balancer = Balancer(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) X_train, y_train = balancer.transform(X_train, y_train)","title":"Balancer"},{"location":"API/data_cleaning/balancer/#balancer","text":"class atom.data_cleaning. Balancer (strategy=\"ADASYN\", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide . Parameters: strategy: str, optional (default=\"ADASYN\") Type of algorithm to use for oversampling or undersampling. Choose from one of the estimators available in the imbalanced-learn package. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's classes attribute for an overview of the target class distribution per data set.","title":"Balancer"},{"location":"API/data_cleaning/balancer/#attributes","text":"Attributes: : imblearn estimator Estimator instance (lowercase strategy) used to oversample or undersample the data, e.g. balancer.adasyn for the default strategy. mapping: dict Dictionary of the target values mapped to their respective encoded integer.","title":"Attributes"},{"location":"API/data_cleaning/balancer/#methods","text":"get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Balancer Estimator instance. method transform (X, y) [source] Oversample or undersample the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column.","title":"Methods"},{"location":"API/data_cleaning/balancer/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.balance(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) or from atom.data_cleaning import Balancer balancer = Balancer(strategy=\"NearMiss\", sampling_strategy=0.7, n_neighbors=10) X_train, y_train = balancer.transform(X_train, y_train)","title":"Example"},{"location":"API/data_cleaning/cleaner/","text":"Cleaner class atom.data_cleaning. Cleaner (prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True, strip_categorical=True, drop_duplicates=False, missing_target=True, encode_target=True, verbose=0, logger=None) [source] Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. This class can be accessed from atom through the clean method. Read more in the user guide . Parameters: prohibited_types: str, sequence or None, optional (default=None) Columns with these types are dropped from the dataset. maximum_cardinality: bool, optional (default=True) Whether to drop categorical columns with maximum cardinality, i.e. the number of unique values is equal to the number of instances. Usually the case for names, IDs, etc... minimum_cardinality: bool, optional (default=True) Whether to drop columns with minimum cardinality, i.e. all values in the column are the same. strip_categorical: bool, optional (default=True) Whether to strip the spaces from the categorical columns. drop_duplicates: bool, optional (default=False) Whether to drop duplicate rows. Only the first occurrence of every duplicated row is kept. missing_target: bool, optional (default=True) Whether to drop rows with missing values in the target column. Is ignored if y is not provided. encode_target: bool, optional (default=True) Whether to Label-encode the target column. Is ignored if y is not provided. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Attributes Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Only available if encode_target=True. Methods fit_transform Same as transform. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Cleaner Estimator instance. method transform (X, y=None) [source] Apply the data cleaning steps on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean(maximum_cardinality=False) or from atom.data_cleaning import Cleaner cleaner = Cleaner(maximum_cardinality=False) X, y = cleaner.transform(X, y)","title":"Cleaner"},{"location":"API/data_cleaning/cleaner/#cleaner","text":"class atom.data_cleaning. Cleaner (prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True, strip_categorical=True, drop_duplicates=False, missing_target=True, encode_target=True, verbose=0, logger=None) [source] Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are: Drop columns with prohibited data types. Drop categorical columns with maximal cardinality. Drop columns with minimum cardinality. Strip categorical features from white spaces. Drop duplicate rows. Drop rows with missing values in the target column. Encode the target column. This class can be accessed from atom through the clean method. Read more in the user guide . Parameters: prohibited_types: str, sequence or None, optional (default=None) Columns with these types are dropped from the dataset. maximum_cardinality: bool, optional (default=True) Whether to drop categorical columns with maximum cardinality, i.e. the number of unique values is equal to the number of instances. Usually the case for names, IDs, etc... minimum_cardinality: bool, optional (default=True) Whether to drop columns with minimum cardinality, i.e. all values in the column are the same. strip_categorical: bool, optional (default=True) Whether to strip the spaces from the categorical columns. drop_duplicates: bool, optional (default=False) Whether to drop duplicate rows. Only the first occurrence of every duplicated row is kept. missing_target: bool, optional (default=True) Whether to drop rows with missing values in the target column. Is ignored if y is not provided. encode_target: bool, optional (default=True) Whether to Label-encode the target column. Is ignored if y is not provided. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation.","title":"Cleaner"},{"location":"API/data_cleaning/cleaner/#attributes","text":"Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. mapping: dict Dictionary of the target values mapped to their respective encoded integer. Only available if encode_target=True.","title":"Attributes"},{"location":"API/data_cleaning/cleaner/#methods","text":"fit_transform Same as transform. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Cleaner Estimator instance. method transform (X, y=None) [source] Apply the data cleaning steps on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/cleaner/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean(maximum_cardinality=False) or from atom.data_cleaning import Cleaner cleaner = Cleaner(maximum_cardinality=False) X, y = cleaner.transform(X, y)","title":"Example"},{"location":"API/data_cleaning/encoder/","text":"Encoder class atom.data_cleaning. Encoder (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None, verbose=0, logger=None, **kwargs) [source] Perform encoding of categorical features. The encoding type depends on the number of classes in the column: If n_classes=2, use Ordinal-encoding. If 2 < n_classes <= max_onehot , use OneHot-encoding. If n_classes > max_onehot , use strategy -encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. An error is raised if it encounters missing values or unknown classes when transforming. This class can be accessed from atom through the encode method. Read more in the user guide . Parameters: strategy: str, optional (default=\"LeaveOneOut\") Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for: OneHotEncoder: Use the max_onehot parameter. HashingEncoder: Incompatibility of APIs. max_onehot: int or None, optional (default=10) Maximum number of unique values in a feature to perform one-hot-encoding. If None, it will always use strategy when n_unique > 2. frac_to_other: float, optional (default=None) Classes with less occurrences than n_rows * frac_to_other are replaced with the string other . If None, skip this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's categorical attribute for a list of the categorical columns in the dataset. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: Encoder Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Encoder Estimator instance. method transform (X, y=None) [source] Encode the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.encode(strategy=\"CatBoost\", max_onehot=5) or from atom.data_cleaning import Encoder encoder = Encoder(strategy=\"CatBoost\", max_onehot=5) encoder.fit(X_train, y_train) X = encoder.transform(X)","title":"Encoder"},{"location":"API/data_cleaning/encoder/#encoder","text":"class atom.data_cleaning. Encoder (strategy=\"LeaveOneOut\", max_onehot=10, frac_to_other=None, verbose=0, logger=None, **kwargs) [source] Perform encoding of categorical features. The encoding type depends on the number of classes in the column: If n_classes=2, use Ordinal-encoding. If 2 < n_classes <= max_onehot , use OneHot-encoding. If n_classes > max_onehot , use strategy -encoding. Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. An error is raised if it encounters missing values or unknown classes when transforming. This class can be accessed from atom through the encode method. Read more in the user guide . Parameters: strategy: str, optional (default=\"LeaveOneOut\") Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for: OneHotEncoder: Use the max_onehot parameter. HashingEncoder: Incompatibility of APIs. max_onehot: int or None, optional (default=10) Maximum number of unique values in a feature to perform one-hot-encoding. If None, it will always use strategy when n_unique > 2. frac_to_other: float, optional (default=None) Classes with less occurrences than n_rows * frac_to_other are replaced with the string other . If None, skip this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's categorical attribute for a list of the categorical columns in the dataset.","title":"Encoder"},{"location":"API/data_cleaning/encoder/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: Encoder Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Encoder Estimator instance. method transform (X, y=None) [source] Encode the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set.","title":"Methods"},{"location":"API/data_cleaning/encoder/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.encode(strategy=\"CatBoost\", max_onehot=5) or from atom.data_cleaning import Encoder encoder = Encoder(strategy=\"CatBoost\", max_onehot=5) encoder.fit(X_train, y_train) X = encoder.transform(X)","title":"Example"},{"location":"API/data_cleaning/imputer/","text":"Imputer class atom.data_cleaning. Imputer (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, verbose=0, logger=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the missing attribute to customize what are considered \"missing values\". This class can be accessed from atom through the impute method. Read more in the user guide . Parameters: strat_num: str, int or float, optional (default=\"drop\") Imputing strategy for numerical columns. Choose from: \"drop\": Drop rows containing missing values. \"mean\": Impute with mean of column. \"median\": Impute with median of column. \"knn\": Impute using a K-Nearest Neighbors approach. \"most_frequent\": Impute with most frequent value. int or float: Impute with provided numerical value. strat_cat: str, optional (default=\"drop\") Imputing strategy for categorical columns. Choose from: \"drop\": Drop rows containing missing values. \"most_frequent\": Impute with most frequent value. str: Impute with provided string. min_frac_rows: float or None, optional (default=None) Minimum fraction of non-missing values in a row (if less, the row is removed). If None, ignore this step. min_frac_cols: float or None, optional (default=None) Minimum fraction of non-missing values in a column (if less, the column is removed). If None, ignore this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's nans attribute for an overview of the number of missing values per column. Attributes Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Imputer Fitted instance of self. method fit_transform (X, y=None) [source] Fit the Imputer and return the imputed data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: imputer Estimator instance. method transform (X, y=None) [source] Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,) Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) or from atom.data_cleaning import Imputer imputer = Imputer(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) imputer.fit(X_train, y_train) X = imputer.transform(X)","title":"Imputer"},{"location":"API/data_cleaning/imputer/#imputer","text":"class atom.data_cleaning. Imputer (strat_num=\"drop\", strat_cat=\"drop\", min_frac_rows=None, min_frac_cols=None, verbose=0, logger=None) [source] Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the missing attribute to customize what are considered \"missing values\". This class can be accessed from atom through the impute method. Read more in the user guide . Parameters: strat_num: str, int or float, optional (default=\"drop\") Imputing strategy for numerical columns. Choose from: \"drop\": Drop rows containing missing values. \"mean\": Impute with mean of column. \"median\": Impute with median of column. \"knn\": Impute using a K-Nearest Neighbors approach. \"most_frequent\": Impute with most frequent value. int or float: Impute with provided numerical value. strat_cat: str, optional (default=\"drop\") Imputing strategy for categorical columns. Choose from: \"drop\": Drop rows containing missing values. \"most_frequent\": Impute with most frequent value. str: Impute with provided string. min_frac_rows: float or None, optional (default=None) Minimum fraction of non-missing values in a row (if less, the row is removed). If None, ignore this step. min_frac_cols: float or None, optional (default=None) Minimum fraction of non-missing values in a column (if less, the column is removed). If None, ignore this step. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's nans attribute for an overview of the number of missing values per column.","title":"Imputer"},{"location":"API/data_cleaning/imputer/#attributes","text":"Attributes: missing: list List of values that are considered \"missing\". Default values are: \"\", \"?\", \"None\", \"NA\", \"nan\", \"NaN\" and \"inf\". Note that None , NaN , +inf and -inf are always considered missing since they are incompatible with sklearn estimators.","title":"Attributes"},{"location":"API/data_cleaning/imputer/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Imputer Fitted instance of self. method fit_transform (X, y=None) [source] Fit the Imputer and return the imputed data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: imputer Estimator instance. method transform (X, y=None) [source] Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,) Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/imputer/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) or from atom.data_cleaning import Imputer imputer = Imputer(strat_num=\"knn\", strat_cat=\"drop\", min_frac_cols=0.8) imputer.fit(X_train, y_train) X = imputer.transform(X)","title":"Example"},{"location":"API/data_cleaning/pruner/","text":"Pruner class atom.data_cleaning. Pruner (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, verbose=0, logger=None, **kwargs) [source] Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed from atom through the prune method. Read more in the user guide . Parameters: strategy: str, optional (default=\"z-score\") Strategy with which to select the outliers. Choose from: \"z-score\": Uses the z-score of each data value. \"iForest\": Uses an Isolation Forest . \"EE\": Uses an Elliptic Envelope . \"LOF\": Uses a Local Outlier Factor . \"SVM\": Uses a One-class SVM . \"DBSCAN\": Uses DBSCAN clustering. \"OPTICS\": Uses OPTICS clustering. method: int, float or str, optional (default=\"drop\") Method to apply on the outliers. Only the z-score strategy accepts another method than \"drop\". Choose from: \"drop\": Drop any sample with outlier values. \"min_max\": Replace the outlier with the min or max of the column. Any numerical value with which to replace the outliers. max_sigma: int or float, optional (default=3) Maximum allowed standard deviations from the mean of the column. If more, it is considered an outlier. Only if strategy=\"z-score\". include_target: bool, optional (default=False) Whether to include the target column in the transformation. This can be useful for regression tasks. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's outliers attribute for an overview of the number of outlier values per column. Attributes Attributes: : sklearn estimator Estimator instance (lowercase strategy) used to prune the data, e.g. pruner.iforest for the isolation forest strategy. Methods get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Outliers Estimator instance. method transform (X, y=None) [source] Apply the outlier strategy on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.prune(strategy=\"z-score\", max_sigma=2, include_target=True) or from atom.data_cleaning import Pruner pruner = Pruner(strategy=\"z-score\", max_sigma=2, include_target=True) X_train, y_train = pruner.transform(X_train, y_train)","title":"Pruner"},{"location":"API/data_cleaning/pruner/#pruner","text":"class atom.data_cleaning. Pruner (strategy=\"z-score\", method=\"drop\", max_sigma=3, include_target=False, verbose=0, logger=None, **kwargs) [source] Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed from atom through the prune method. Read more in the user guide . Parameters: strategy: str, optional (default=\"z-score\") Strategy with which to select the outliers. Choose from: \"z-score\": Uses the z-score of each data value. \"iForest\": Uses an Isolation Forest . \"EE\": Uses an Elliptic Envelope . \"LOF\": Uses a Local Outlier Factor . \"SVM\": Uses a One-class SVM . \"DBSCAN\": Uses DBSCAN clustering. \"OPTICS\": Uses OPTICS clustering. method: int, float or str, optional (default=\"drop\") Method to apply on the outliers. Only the z-score strategy accepts another method than \"drop\". Choose from: \"drop\": Drop any sample with outlier values. \"min_max\": Replace the outlier with the min or max of the column. Any numerical value with which to replace the outliers. max_sigma: int or float, optional (default=3) Maximum allowed standard deviations from the mean of the column. If more, it is considered an outlier. Only if strategy=\"z-score\". include_target: bool, optional (default=False) Whether to include the target column in the transformation. This can be useful for regression tasks. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. **kwargs Additional keyword arguments passed to the strategy estimator. Tip Use atom's outliers attribute for an overview of the number of outlier values per column.","title":"Pruner"},{"location":"API/data_cleaning/pruner/#attributes","text":"Attributes: : sklearn estimator Estimator instance (lowercase strategy) used to prune the data, e.g. pruner.iforest for the isolation forest strategy.","title":"Attributes"},{"location":"API/data_cleaning/pruner/#methods","text":"get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Outliers Estimator instance. method transform (X, y=None) [source] Apply the outlier strategy on the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. X: pd.Series Transformed target column. Only returned if provided.","title":"Methods"},{"location":"API/data_cleaning/pruner/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.prune(strategy=\"z-score\", max_sigma=2, include_target=True) or from atom.data_cleaning import Pruner pruner = Pruner(strategy=\"z-score\", max_sigma=2, include_target=True) X_train, y_train = pruner.transform(X_train, y_train)","title":"Example"},{"location":"API/data_cleaning/scaler/","text":"Scaler class atom.data_cleaning. Scaler (strategy=\"standard\", verbose=0, logger=None) [source] This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the scale method. Read more in the user guide . Parameters: strategy: str, optional (default=\"standard\") Scaler object with which to scale the data. Options are: standard: Scale with StandardScaler . minmax: Scale with MinMaxScaler . maxabs: Scale with MaxAbsScaler . robust: Scale with RobustScaler . verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's scaled attribute to check if the feature set is scaled. Attributes Attributes: scaler: sklearn transformer Estimator's instance with which the data is scaled. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Compute the mean and std to be used for later scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Scaler Fitted instance of self. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Scaler Estimator instance. method transform (X, y=None) [source] Perform standardization by centering and scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.scale() or from atom.data_cleaning import Scaler scaler = Scaler() scaler.fit(X_train) X = scaler.transform(X)","title":"Scaler"},{"location":"API/data_cleaning/scaler/#scaler","text":"class atom.data_cleaning. Scaler (strategy=\"standard\", verbose=0, logger=None) [source] This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the scale method. Read more in the user guide . Parameters: strategy: str, optional (default=\"standard\") Scaler object with which to scale the data. Options are: standard: Scale with StandardScaler . minmax: Scale with MinMaxScaler . maxabs: Scale with MaxAbsScaler . robust: Scale with RobustScaler . verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. Tip Use atom's scaled attribute to check if the feature set is scaled.","title":"Scaler"},{"location":"API/data_cleaning/scaler/#attributes","text":"Attributes: scaler: sklearn transformer Estimator's instance with which the data is scaled.","title":"Attributes"},{"location":"API/data_cleaning/scaler/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Compute the mean and std to be used for later scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: self: Scaler Fitted instance of self. method fit_transform (X, y=None) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: Scaler Estimator instance. method transform (X, y=None) [source] Perform standardization by centering and scaling. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Scaled feature set.","title":"Methods"},{"location":"API/data_cleaning/scaler/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.scale() or from atom.data_cleaning import Scaler scaler = Scaler() scaler.fit(X_train) X = scaler.transform(X)","title":"Example"},{"location":"API/feature_engineering/feature_generator/","text":"FeatureGenerator class atom.feature_engineering. FeatureGenerator (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide . Parameters: strategy: str, optional (default=\"DFS\") Strategy to crate new features. Choose from: \"DFS\" to use Deep Feature Synthesis. \"GFG\" or \"genetic\" to use Genetic Feature Generation. n_features: int or None, optional (default=None) Number of newly generated features to add to the dataset (no more than 1% of the population for the genetic strategy). If None, select all created features. generations: int, optional (default=20) Number of generations to evolve. Only for the genetic strategy. population: int, optional (default=500) Number of programs in each generation. Only for the genetic strategy. operators: str, list, tuple or None, optional (default=None) Name of the operators to be used on the features. None to use all. Choose from: \"add\", \"sub\", \"mul\", \"div\", \"sqrt\", \"log\", \"sin\", \"cos\", \"tan\". n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task. Attributes Attributes: symbolic_transformer: SymbolicTransformer Instance used to calculate the genetic features. Only for the genetic strategy. genetic_features: pd.DataFrame Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include: name: Name of the feature (automatically created). description: Operators used to create this feature. fitness: Fitness score. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureGenerator Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Feature set with the newly generated features. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureGenerator Estimator instance. method transform (X, y=None) [source] Generate new features. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Feature set with the newly generated features. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_generation(strategy=\"genetic\", n_features=3, generations=30, population=400) or from atom.feature_engineering import FeatureGenerator feature_generator = FeatureGenerator(strategy=\"genetic\", n_features=3, generations=30, population=400) feature_generator.fit(X_train, y_train) X = feature_generator.transform(X)","title":"FeatureGenerator"},{"location":"API/feature_engineering/feature_generator/#featuregenerator","text":"class atom.feature_engineering. FeatureGenerator (strategy=\"DFS\", n_features=None, generations=20, population=500, operators=None, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide . Parameters: strategy: str, optional (default=\"DFS\") Strategy to crate new features. Choose from: \"DFS\" to use Deep Feature Synthesis. \"GFG\" or \"genetic\" to use Genetic Feature Generation. n_features: int or None, optional (default=None) Number of newly generated features to add to the dataset (no more than 1% of the population for the genetic strategy). If None, select all created features. generations: int, optional (default=20) Number of generations to evolve. Only for the genetic strategy. population: int, optional (default=500) Number of programs in each generation. Only for the genetic strategy. operators: str, list, tuple or None, optional (default=None) Name of the operators to be used on the features. None to use all. Choose from: \"add\", \"sub\", \"mul\", \"div\", \"sqrt\", \"log\", \"sin\", \"cos\", \"tan\". n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Tip DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features! Warning Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property. Warning When using DFS with n_jobs>1 , make sure to protect your code with if __name__ == \"__main__\" . Featuretools uses dask , which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task.","title":"FeatureGenerator"},{"location":"API/feature_engineering/feature_generator/#attributes","text":"Attributes: symbolic_transformer: SymbolicTransformer Instance used to calculate the genetic features. Only for the genetic strategy. genetic_features: pd.DataFrame Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include: name: Name of the feature (automatically created). description: Operators used to create this feature. fitness: Fitness score.","title":"Attributes"},{"location":"API/feature_engineering/feature_generator/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y) [source] Fit to data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureGenerator Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Feature set with the newly generated features. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureGenerator Estimator instance. method transform (X, y=None) [source] Generate new features. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Feature set with the newly generated features.","title":"Methods"},{"location":"API/feature_engineering/feature_generator/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_generation(strategy=\"genetic\", n_features=3, generations=30, population=400) or from atom.feature_engineering import FeatureGenerator feature_generator = FeatureGenerator(strategy=\"genetic\", n_features=3, generations=30, population=400) feature_generator.fit(X_train, y_train) X = feature_generator.transform(X)","title":"Example"},{"location":"API/feature_engineering/feature_selector/","text":"FeatureSelector class atom.feature_engineering. FeatureSelector (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. This class can be accessed from atom through the feature_selection method. Read more in the user guide . Parameters: strategy: string or None, optional (default=None) Feature selection strategy to use. Choose from: None: Do not perform any feature selection algorithm. \"univariate\": Select best features according to a univariate F-test. \"PCA\": Perform principal component analysis. \"SFM\": Select best features according to a model. \"RFE\": Perform recursive feature elimination. \"RFECV\": Perform RFE with cross-validated selection. \"SFS\": Perform Sequential Feature Selection. solver: string, estimator or None, optional (default=None) Solver or model to use for the feature selection strategy. See sklearn's documentation for an extended description of the choices. Select None for the default option per strategy (only for univariate and PCA). for \"univariate\", choose from: \"f_classif\" \"f_regression\" \"mutual_info_classif\" \"mutual_info_regression\" \"chi2\" Any function taking two arrays (X, y), and returning arrays (scores, p-values). See the sklearn documentation . for \"PCA\", choose from: \"auto\" (default) \"full\" \"arpack\" \"randomized\" for \"SFM\", \"RFE\", \"RFECV\" and \"SFS\": The base estimator. For SFM, RFE and RFECV, it should have either a either a feature_importances_ or coef_ attribute after fitting. You can use one of ATOM's predefined models . Add _class or _reg after the model's name to specify a classification or regression task, e.g. solver=\"LGB_reg\" (not necessary if called from an atom instance. No default option. n_features: int, float or None, optional (default=None) Number of features to select. Choose from: if None: Select all features. if < 1: Fraction of the total features to select. if >= 1: Number of features to select. If strategy=\"SFM\" and the threshold parameter is not specified, the threshold is set to -np.inf to select the n_features features. If strategy=\"RFECV\", it's the minimum number of features to select. max_frac_repeated: float or None, optional (default=1.) Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. None to skip this step. max_correlation: float or None, optional (default=1.) Minimum value of the Pearson correlation coefficient to identify correlated features. A value of 1 removes on of 2 equal columns. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. None to skip this step. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Any extra keyword argument for the PCA, SFM, RFE, RFECV and SFS estimators. See the corresponding sklearn documentation for the available options. Info If strategy=\"PCA\", the data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE, RFECV AND SFS strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs. Attributes Utility attributes Attributes: collinear: pd.DataFrame Dataframe of the removed collinear features. Columns include: drop_feature: Name of the feature dropped by the method. correlated feature: Name of the correlated feature(s). correlation_value: Pearson correlation coefficients of the feature pairs. feature_importance: list Remaining features ordered by importance. Only if strategy in [\"univariate\", \"SFM, \"RFE\", \"RFECV\"]. For RFE and RFECV, the importance is extracted from the external estimator fitted on the reduced set. : sklearn estimator Estimator instance (lowercase strategy) used to transform the data, e.g. balancer.pca for the PCA strategy. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per component. plot_rfecv Plot the scores obtained by the estimator on the RFECV. reset_aesthetics Reset the plot aesthetics to their default values. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies all need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureSelector Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. See plot_pca for a description of the parameters. method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. See plot_components for a description of the parameters. method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the scores obtained by the estimator fitted on every subset of the data. See plot_rfecv for a description of the parameters. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureSelector Estimator instance. method transform (X, y=None) [source] Transform the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) atom.plot_pca(filename=\"pca\", figsize=(8, 5)) or from atom.feature_engineering import FeatureSelector feature_selector = FeatureSelector(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) feature_selector.fit(X_train, y_train) X = feature_selector.transform(X, y) feature_selector.plot_pca(filename=\"pca\", figsize=(8, 5))","title":"FeatureSelector"},{"location":"API/feature_engineering/feature_selector/#featureselector","text":"class atom.feature_engineering. FeatureSelector (strategy=None, solver=None, n_features=None, max_frac_repeated=1., max_correlation=1., n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source] Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. This class can be accessed from atom through the feature_selection method. Read more in the user guide . Parameters: strategy: string or None, optional (default=None) Feature selection strategy to use. Choose from: None: Do not perform any feature selection algorithm. \"univariate\": Select best features according to a univariate F-test. \"PCA\": Perform principal component analysis. \"SFM\": Select best features according to a model. \"RFE\": Perform recursive feature elimination. \"RFECV\": Perform RFE with cross-validated selection. \"SFS\": Perform Sequential Feature Selection. solver: string, estimator or None, optional (default=None) Solver or model to use for the feature selection strategy. See sklearn's documentation for an extended description of the choices. Select None for the default option per strategy (only for univariate and PCA). for \"univariate\", choose from: \"f_classif\" \"f_regression\" \"mutual_info_classif\" \"mutual_info_regression\" \"chi2\" Any function taking two arrays (X, y), and returning arrays (scores, p-values). See the sklearn documentation . for \"PCA\", choose from: \"auto\" (default) \"full\" \"arpack\" \"randomized\" for \"SFM\", \"RFE\", \"RFECV\" and \"SFS\": The base estimator. For SFM, RFE and RFECV, it should have either a either a feature_importances_ or coef_ attribute after fitting. You can use one of ATOM's predefined models . Add _class or _reg after the model's name to specify a classification or regression task, e.g. solver=\"LGB_reg\" (not necessary if called from an atom instance. No default option. n_features: int, float or None, optional (default=None) Number of features to select. Choose from: if None: Select all features. if < 1: Fraction of the total features to select. if >= 1: Number of features to select. If strategy=\"SFM\" and the threshold parameter is not specified, the threshold is set to -np.inf to select the n_features features. If strategy=\"RFECV\", it's the minimum number of features to select. max_frac_repeated: float or None, optional (default=1.) Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. None to skip this step. max_correlation: float or None, optional (default=1.) Minimum value of the Pearson correlation coefficient to identify correlated features. A value of 1 removes on of 2 equal columns. A dataframe of the removed features and their correlation values can be accessed through the collinear attribute. None to skip this step. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . **kwargs Any extra keyword argument for the PCA, SFM, RFE, RFECV and SFS estimators. See the corresponding sklearn documentation for the available options. Info If strategy=\"PCA\", the data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already). Tip Use the plot_feature_importance method to examine how much a specific feature contributes to the final predictions. If the model doesn't have a feature_importances_ attribute, use plot_permutation_importance instead. Warning The RFE, RFECV AND SFS strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs.","title":"FeatureSelector"},{"location":"API/feature_engineering/feature_selector/#attributes","text":"","title":"Attributes"},{"location":"API/feature_engineering/feature_selector/#utility-attributes","text":"Attributes: collinear: pd.DataFrame Dataframe of the removed collinear features. Columns include: drop_feature: Name of the feature dropped by the method. correlated feature: Name of the correlated feature(s). correlation_value: Pearson correlation coefficients of the feature pairs. feature_importance: list Remaining features ordered by importance. Only if strategy in [\"univariate\", \"SFM, \"RFE\", \"RFECV\"]. For RFE and RFECV, the importance is extracted from the external estimator fitted on the reduced set. : sklearn estimator Estimator instance (lowercase strategy) used to transform the data, e.g. balancer.pca for the PCA strategy.","title":"Utility attributes"},{"location":"API/feature_engineering/feature_selector/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/feature_engineering/feature_selector/#methods","text":"fit Fit to data. fit_transform Fit to data, then transform it. get_params Get parameters for this estimator. log Write information to the logger and print to stdout. plot_pca Plot the explained variance ratio vs the number of components. plot_components Plot the explained variance ratio per component. plot_rfecv Plot the scores obtained by the estimator on the RFECV. reset_aesthetics Reset the plot aesthetics to their default values. save Save the instance to a pickle file. set_params Set the parameters of this estimator. transform Transform the data. method fit (X, y=None) [source] Fit to data. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies all need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: self: FeatureSelector Fitted instance of self. method fit_transform (X, y) [source] Fit to data, then transform it. Note that the univariate, SFM (when model is not fitted), RFE and RFECV strategies need a target column. Leaving it None will raise an exception. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored. If int: Index of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). Returns: X: pd.DataFrame Transformed feature set. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. See plot_pca for a description of the parameters. method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. See plot_components for a description of the parameters. method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the scores obtained by the estimator fitted on every subset of the data. See plot_rfecv for a description of the parameters. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method save (filename=None) [source] Save the instance to a pickle file. Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: FeatureSelector Estimator instance. method transform (X, y=None) [source] Transform the data. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) Does nothing. Implemented for continuity of the API. Returns: X: pd.DataFrame Transformed feature set.","title":"Methods"},{"location":"API/feature_engineering/feature_selector/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) atom.plot_pca(filename=\"pca\", figsize=(8, 5)) or from atom.feature_engineering import FeatureSelector feature_selector = FeatureSelector(stratgey=\"pca\", n_features=12, whiten=True, max_correlation=0.96) feature_selector.fit(X_train, y_train) X = feature_selector.transform(X, y) feature_selector.plot_pca(filename=\"pca\", figsize=(8, 5))","title":"Example"},{"location":"API/models/adab/","text":"AdaBoost (AdaB) AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on the original dataset and then fits additional copies of the algorithm on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. Corresponding estimators are: AdaBoostClassifier for classification tasks. AdaBoostRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The algorithm parameter is only used with AdaBoostClassifier. The loss parameter is only used with AdaBoostRegressor. The random_state parameter is set equal to that of the trainer. Dimensions: n_estimators: int, default=50 Integer(10, 500, name=\"n_estimators\") learning_rate: float, default=1.0 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") algorithm: str, default=\"SAMME.R\" Categorical([\"SAMME.R\", \"SAMME\"], name=\"algorithm\") loss: str, default=\"linear\" Categorical([\"linear\", \"square\", \"exponential\"], name=\"loss\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.adab.plot_permutation_importance() or atom.adab.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"AdaB\", metric=\"poisson\", est_params={\"algorithm\": \"SAMME.R\"})","title":"AdaBoost"},{"location":"API/models/adab/#adaboost-adab","text":"AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on the original dataset and then fits additional copies of the algorithm on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. Corresponding estimators are: AdaBoostClassifier for classification tasks. AdaBoostRegressor for regression tasks. Read more in sklearn's documentation .","title":"AdaBoost (AdaB)"},{"location":"API/models/adab/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The algorithm parameter is only used with AdaBoostClassifier. The loss parameter is only used with AdaBoostRegressor. The random_state parameter is set equal to that of the trainer. Dimensions: n_estimators: int, default=50 Integer(10, 500, name=\"n_estimators\") learning_rate: float, default=1.0 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") algorithm: str, default=\"SAMME.R\" Categorical([\"SAMME.R\", \"SAMME\"], name=\"algorithm\") loss: str, default=\"linear\" Categorical([\"linear\", \"square\", \"exponential\"], name=\"loss\")","title":"Hyperparameters"},{"location":"API/models/adab/#attributes","text":"","title":"Attributes"},{"location":"API/models/adab/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/adab/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/adab/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/adab/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.adab.plot_permutation_importance() or atom.adab.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/adab/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"AdaB\", metric=\"poisson\", est_params={\"algorithm\": \"SAMME.R\"})","title":"Example"},{"location":"API/models/ard/","text":"Automatic Relevance Determination (ARD) Automatic Relevance Determination is very similar to Bayesian Ridge , but can lead to sparser coefficients. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Corresponding estimators are: ARDRegression for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ard.plot_permutation_importance() or atom.ard.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ARD\", n_calls=20, n_initial_points=7, bagging=5)","title":"Automated Relevance Determination"},{"location":"API/models/ard/#automatic-relevance-determination-ard","text":"Automatic Relevance Determination is very similar to Bayesian Ridge , but can lead to sparser coefficients. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Corresponding estimators are: ARDRegression for regression tasks. Read more in sklearn's documentation .","title":"Automatic Relevance Determination (ARD)"},{"location":"API/models/ard/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\")","title":"Hyperparameters"},{"location":"API/models/ard/#attributes","text":"","title":"Attributes"},{"location":"API/models/ard/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ard/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ard/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ard/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ard.plot_permutation_importance() or atom.ard.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ard/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ARD\", n_calls=20, n_initial_points=7, bagging=5)","title":"Example"},{"location":"API/models/bag/","text":"Bagging (Bag) Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree ), by introducing randomization into its construction procedure and then making an ensemble out of it. Corresponding estimators are: BaggingClassifier for classification tasks. BaggingRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=10 Integer(10, 500, name=\"n_estimators\") max_samples: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_samples\") max_features: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_features\") bootstrap: bool, default=True Categorical([True, False], name=\"bootstrap\") bootstrap_features: bool, default=False Categorical([True, False], name=\"bootstrap_features\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.bag.plot_permutation_importance() or atom.bag.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Bag\")","title":"Bagging"},{"location":"API/models/bag/#bagging-bag","text":"Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree ), by introducing randomization into its construction procedure and then making an ensemble out of it. Corresponding estimators are: BaggingClassifier for classification tasks. BaggingRegressor for regression tasks. Read more in sklearn's documentation .","title":"Bagging (Bag)"},{"location":"API/models/bag/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=10 Integer(10, 500, name=\"n_estimators\") max_samples: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_samples\") max_features: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"max_features\") bootstrap: bool, default=True Categorical([True, False], name=\"bootstrap\") bootstrap_features: bool, default=False Categorical([True, False], name=\"bootstrap_features\")","title":"Hyperparameters"},{"location":"API/models/bag/#attributes","text":"","title":"Attributes"},{"location":"API/models/bag/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/bag/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/bag/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/bag/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.bag.plot_permutation_importance() or atom.bag.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/bag/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Bag\")","title":"Example"},{"location":"API/models/bnb/","text":"Bernoulli Naive Bayes (BNB) Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli models. Like Multinomial Naive bayes (MNB) , this classifier is suitable for discrete data. The difference is that while MNB works with occurrence counts, BNB is designed for binary/boolean features. Corresponding estimators are: BernoulliNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"BNB\", metric=\"precision\")","title":"Bernoulli Naive Bayes"},{"location":"API/models/bnb/#bernoulli-naive-bayes-bnb","text":"Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli models. Like Multinomial Naive bayes (MNB) , this classifier is suitable for discrete data. The difference is that while MNB works with occurrence counts, BNB is designed for binary/boolean features. Corresponding estimators are: BernoulliNB for classification tasks. Read more in sklearn's documentation .","title":"Bernoulli Naive Bayes (BNB)"},{"location":"API/models/bnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/bnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/bnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/bnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/bnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/bnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.bnb.plot_permutation_importance() or atom.bnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/bnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"BNB\", metric=\"precision\")","title":"Example"},{"location":"API/models/br/","text":"Bayesian Ridge (BR) Bayesian regression techniques can be used to include regularization parameters in the estimation procedure: the regularization parameter is not set in a hard sense but tuned to the data at hand. Corresponding estimators are: BayesianRidge for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.br.plot_permutation_importance() or atom.br.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"BR\", n_calls=20, n_initial_points=7, bagging=5)","title":"Bayesian Ridge"},{"location":"API/models/br/#bayesian-ridge-br","text":"Bayesian regression techniques can be used to include regularization parameters in the estimation procedure: the regularization parameter is not set in a hard sense but tuned to the data at hand. Corresponding estimators are: BayesianRidge for regression tasks. Read more in sklearn's documentation .","title":"Bayesian Ridge (BR)"},{"location":"API/models/br/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: n_iter: float, default=300 Integer(100, 1000, name=\"n_iter\") alpha_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_1\") alpha_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"alpha_2\") lambda_1: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_1\") lambda_2: float, default=1e-6 Categorical([1e-8, 1e-6, 1e-4, 1e-2], name=\"lambda_2\")","title":"Hyperparameters"},{"location":"API/models/br/#attributes","text":"","title":"Attributes"},{"location":"API/models/br/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/br/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/br/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/br/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.br.plot_permutation_importance() or atom.br.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/br/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"BR\", n_calls=20, n_initial_points=7, bagging=5)","title":"Example"},{"location":"API/models/catb/","text":"CatBoost (CatB) CatBoost is a machine learning method based on gradient boosting over decision trees. Main advantages of CatBoost: Superior quality when compared with other GBDT models on many datasets. Best in class prediction speed. Corresponding estimators are: CatBoostClassifier for classification tasks. CatBoostRegressor for regression tasks. Read more in CatBoost's documentation . Note CatBoost allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The bootstrap_type parameter is set to \"Bernoulli\" to allow for the subsample parameter. The num_leaves and min_child_samples parameters are not available for the CPU implementation. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_lambda: int, default=0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"CatB\", n_calls=50, bo_params={\"max_time\": 1000, \"early_stopping\": 0.1})","title":"CatBoost"},{"location":"API/models/catb/#catboost-catb","text":"CatBoost is a machine learning method based on gradient boosting over decision trees. Main advantages of CatBoost: Superior quality when compared with other GBDT models on many datasets. Best in class prediction speed. Corresponding estimators are: CatBoostClassifier for classification tasks. CatBoostRegressor for regression tasks. Read more in CatBoost's documentation . Note CatBoost allows early stopping to stop the training of unpromising models prematurely!","title":"CatBoost (CatB)"},{"location":"API/models/catb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The bootstrap_type parameter is set to \"Bernoulli\" to allow for the subsample parameter. The num_leaves and min_child_samples parameters are not available for the CPU implementation. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_lambda: int, default=0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/catb/#attributes","text":"","title":"Attributes"},{"location":"API/models/catb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/catb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/catb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/catb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.catb.plot_permutation_importance() or atom.catb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/catb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"CatB\", n_calls=50, bo_params={\"max_time\": 1000, \"early_stopping\": 0.1})","title":"Example"},{"location":"API/models/catnb/","text":"Categorical Naive Bayes (CatNB) Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features. Corresponding estimators are: CategoricalNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CatNB\")","title":"Categorical Naive Bayes"},{"location":"API/models/catnb/#categorical-naive-bayes-catnb","text":"Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features. Corresponding estimators are: CategoricalNB for classification tasks. Read more in sklearn's documentation .","title":"Categorical Naive Bayes (CatNB)"},{"location":"API/models/catnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/catnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/catnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/catnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/catnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/catnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.catnb.plot_permutation_importance() or atom.catnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/catnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CatNB\")","title":"Example"},{"location":"API/models/cnb/","text":"Complement Naive Bayes (CNB) The Complement Naive Bayes classifier was designed to correct the \u201csevere assumptions\u201d made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. Corresponding estimators are: ComplementNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") norm: bool, default=False Categorical([True, False], name=\"norm\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CNB\")","title":"Complement Naive Bayes"},{"location":"API/models/cnb/#complement-naive-bayes-cnb","text":"The Complement Naive Bayes classifier was designed to correct the \u201csevere assumptions\u201d made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. Corresponding estimators are: ComplementNB for classification tasks. Read more in sklearn's documentation .","title":"Complement Naive Bayes (CNB)"},{"location":"API/models/cnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") norm: bool, default=False Categorical([True, False], name=\"norm\")","title":"Hyperparameters"},{"location":"API/models/cnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/cnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/cnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/cnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/cnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.cnb.plot_permutation_importance() or atom.cnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/cnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"CNB\")","title":"Example"},{"location":"API/models/en/","text":"Elastic Net (EN) Linear least squares with l1 and l2 regularization. Corresponding estimators are: ElasticNet for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.en.plot_permutation_importance() or atom.en.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"EN\", est_params={\"l1_ratio\": 0.75})","title":"Elastic Net"},{"location":"API/models/en/#elastic-net-en","text":"Linear least squares with l1 and l2 regularization. Corresponding estimators are: ElasticNet for regression tasks. Read more in sklearn's documentation .","title":"Elastic Net (EN)"},{"location":"API/models/en/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\")","title":"Hyperparameters"},{"location":"API/models/en/#attributes","text":"","title":"Attributes"},{"location":"API/models/en/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/en/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/en/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/en/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.en.plot_permutation_importance() or atom.en.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/en/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"EN\", est_params={\"l1_ratio\": 0.75})","title":"Example"},{"location":"API/models/et/","text":"Extra-Trees (ET) Extra-Trees use a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Corresponding estimators are: ExtraTreesClassifier for classification tasks. ExtraTreesRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.et.plot_permutation_importance() or atom.et.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ET\", metric=\"MSE\", n_calls=5, n_initial_points=1)","title":"Extra-Trees"},{"location":"API/models/et/#extra-trees-et","text":"Extra-Trees use a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Corresponding estimators are: ExtraTreesClassifier for classification tasks. ExtraTreesRegressor for regression tasks. Read more in sklearn's documentation .","title":"Extra-Trees (ET)"},{"location":"API/models/et/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\")","title":"Hyperparameters"},{"location":"API/models/et/#attributes","text":"","title":"Attributes"},{"location":"API/models/et/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/et/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/et/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/et/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.et.plot_permutation_importance() or atom.et.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/et/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"ET\", metric=\"MSE\", n_calls=5, n_initial_points=1)","title":"Example"},{"location":"API/models/gbm/","text":"Gradient Boosting Machine (GBM) A Gradient Boosting Machine builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Corresponding estimators are: GradientBoostingClassifier for classification tasks. GradientBoostingRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. For multiclass classification tasks, the loss parameter is always set to \"deviance\". The alpha parameter is only used when loss = \"huber\" or \"quantile\". The random_state parameter is set equal to that of the trainer. Dimensions: learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") criterion: str, default=\"friedman_mse\" Categorical([\"friedman_mse\", \"mae\", \"mse\"], name=\"criterion\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_depth: int, default=3 Integer(1, 10, name=\"max_depth\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0 Real(0, 0.035, name=\"ccp_alpha\") loss: str binary classifier: default=\"deviance\" Categorical([\"deviance\", \"exponential\"], name=\"loss\") regressor: default=\"ls\" Categorical([\"ls\", \"lad\", \"huber\", \"quantile\"], name=\"loss\") alpha: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"alpha\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() or atom.gbm.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GBM\")","title":"Gradient Boosting Machine"},{"location":"API/models/gbm/#gradient-boosting-machine-gbm","text":"A Gradient Boosting Machine builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Corresponding estimators are: GradientBoostingClassifier for classification tasks. GradientBoostingRegressor for regression tasks. Read more in sklearn's documentation .","title":"Gradient Boosting Machine (GBM)"},{"location":"API/models/gbm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. For multiclass classification tasks, the loss parameter is always set to \"deviance\". The alpha parameter is only used when loss = \"huber\" or \"quantile\". The random_state parameter is set equal to that of the trainer. Dimensions: learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") criterion: str, default=\"friedman_mse\" Categorical([\"friedman_mse\", \"mae\", \"mse\"], name=\"criterion\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_depth: int, default=3 Integer(1, 10, name=\"max_depth\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0 Real(0, 0.035, name=\"ccp_alpha\") loss: str binary classifier: default=\"deviance\" Categorical([\"deviance\", \"exponential\"], name=\"loss\") regressor: default=\"ls\" Categorical([\"ls\", \"lad\", \"huber\", \"quantile\"], name=\"loss\") alpha: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"alpha\")","title":"Hyperparameters"},{"location":"API/models/gbm/#attributes","text":"","title":"Attributes"},{"location":"API/models/gbm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gbm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gbm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gbm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.gbm.plot_permutation_importance() or atom.gbm.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gbm/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GBM\")","title":"Example"},{"location":"API/models/gnb/","text":"Gaussian Naive bayes (GNB) Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian. Corresponding estimators are: GaussianNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GNB has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"GNB\")","title":"Gaussian Naive Bayes"},{"location":"API/models/gnb/#gaussian-naive-bayes-gnb","text":"Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian. Corresponding estimators are: GaussianNB for classification tasks. Read more in sklearn's documentation .","title":"Gaussian Naive bayes (GNB)"},{"location":"API/models/gnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GNB has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/gnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/gnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gnb/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gnb.plot_permutation_importance() or atom.gnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"GNB\")","title":"Example"},{"location":"API/models/gp/","text":"Gaussian Process (GP) Gaussian Processes are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations. The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. The disadvantages of Gaussian processes include: They are not sparse, i.e. they use the whole samples/features information to perform the prediction. They lose efficiency in high dimensional spaces, namely when the number of features exceeds a few dozens. Corresponding estimators are: GaussianProcessClassifier for classification tasks. GaussianProcessClassifier for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GP has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GP\", metric=\"medae\")","title":"Gaussian Process"},{"location":"API/models/gp/#gaussian-process-gp","text":"Gaussian Processes are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations. The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. The disadvantages of Gaussian processes include: They are not sparse, i.e. they use the whole samples/features information to perform the prediction. They lose efficiency in high dimensional spaces, namely when the number of features exceeds a few dozens. Corresponding estimators are: GaussianProcessClassifier for classification tasks. GaussianProcessClassifier for regression tasks. Read more in sklearn's documentation .","title":"Gaussian Process (GP)"},{"location":"API/models/gp/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. GP has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/gp/#attributes","text":"","title":"Attributes"},{"location":"API/models/gp/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/gp/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/gp/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/gp/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.gp.plot_permutation_importance() or atom.gp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/gp/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"GP\", metric=\"medae\")","title":"Example"},{"location":"API/models/knn/","text":"K-Nearest Neighbors (KNN) K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest neighbors vote. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: KNeighborsClassifier for classification tasks. KNeighborsRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. Dimensions: n_neighbors: int, default=5 Integer(1, 100, name=\"n_neighbors\") weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.knn.plot_permutation_importance() or atom.knn.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"KNN\", metric=\"ME\", n_calls=20, bo_params={\"max_time\": 1000})","title":"K-Nearest Neighbors"},{"location":"API/models/knn/#k-nearest-neighbors-knn","text":"K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest neighbors vote. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: KNeighborsClassifier for classification tasks. KNeighborsRegressor for regression tasks. Read more in sklearn's documentation .","title":"K-Nearest Neighbors (KNN)"},{"location":"API/models/knn/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. Dimensions: n_neighbors: int, default=5 Integer(1, 100, name=\"n_neighbors\") weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\")","title":"Hyperparameters"},{"location":"API/models/knn/#attributes","text":"","title":"Attributes"},{"location":"API/models/knn/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/knn/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/knn/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/knn/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.knn.plot_permutation_importance() or atom.knn.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/knn/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"KNN\", metric=\"ME\", n_calls=20, bo_params={\"max_time\": 1000})","title":"Example"},{"location":"API/models/ksvm/","text":"Kernel-SVM (kSVM) The implementation of the Kernel (non-linear) Support Vector Machine is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using a Linear Support Vector Machine or a Stochastic Gradient descent model instead. The multiclass support is handled according to a one-vs-one scheme. Corresponding estimators are: SVC for classification tasks. SVR for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The degree parameter is only used when kernel = \"poly\". The gamma parameter is always set to \"scale\" when kernel = \"poly\". The coef0 parameter is only used when kernel = \"rbf\". The random_state parameter is set equal to that of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") kernel: str, default=\"rbf\" Categorical([\"poly\", \"rbf\", \"sigmoid\"], name=\"kernel\") degree: int, default=3 Integer(2, 5, name=\"degree\"). gamma: str, default=\"scale\" Categorical([\"scale\", \"auto\"], name=\"gamma\") coef0: float, default=0 Real(-1.0, 1.0, name=\"coef0\"). shrinking: bool, default=True Categorical([True, False], name=\"shrinking\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"kSVM\", metric=\"r2\", est_params={\"kernel\": \"rbf\"})","title":"Kernel-SVM"},{"location":"API/models/ksvm/#kernel-svm-ksvm","text":"The implementation of the Kernel (non-linear) Support Vector Machine is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using a Linear Support Vector Machine or a Stochastic Gradient descent model instead. The multiclass support is handled according to a one-vs-one scheme. Corresponding estimators are: SVC for classification tasks. SVR for regression tasks. Read more in sklearn's documentation .","title":"Kernel-SVM (kSVM)"},{"location":"API/models/ksvm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The degree parameter is only used when kernel = \"poly\". The gamma parameter is always set to \"scale\" when kernel = \"poly\". The coef0 parameter is only used when kernel = \"rbf\". The random_state parameter is set equal to that of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") kernel: str, default=\"rbf\" Categorical([\"poly\", \"rbf\", \"sigmoid\"], name=\"kernel\") degree: int, default=3 Integer(2, 5, name=\"degree\"). gamma: str, default=\"scale\" Categorical([\"scale\", \"auto\"], name=\"gamma\") coef0: float, default=0 Real(-1.0, 1.0, name=\"coef0\"). shrinking: bool, default=True Categorical([True, False], name=\"shrinking\")","title":"Hyperparameters"},{"location":"API/models/ksvm/#attributes","text":"","title":"Attributes"},{"location":"API/models/ksvm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ksvm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ksvm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ksvm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.ksvm.plot_permutation_importance() or atom.ksvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ksvm/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"kSVM\", metric=\"r2\", est_params={\"kernel\": \"rbf\"})","title":"Example"},{"location":"API/models/lasso/","text":"Lasso Regression (Lasso) Linear least squares with l1 regularization. Corresponding estimators are: Lasso for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() or atom.lasso.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Lasso\")","title":"Lasso"},{"location":"API/models/lasso/#lasso-regression-lasso","text":"Linear least squares with l1 regularization. Corresponding estimators are: Lasso for regression tasks. Read more in sklearn's documentation .","title":"Lasso Regression (Lasso)"},{"location":"API/models/lasso/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") selection: str, default=\"cyclic\" Categorical([\"cyclic\", \"random\"], name=\"selection\")","title":"Hyperparameters"},{"location":"API/models/lasso/#attributes","text":"","title":"Attributes"},{"location":"API/models/lasso/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lasso/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lasso/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lasso/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lasso.plot_permutation_importance() or atom.lasso.predict(X) .The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's . If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lasso/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Lasso\")","title":"Example"},{"location":"API/models/lda/","text":"Linear Discriminant Analysis (LDA) Linear Discriminant Analysis is a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: LinearDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The shrinkage parameter is not used when solver = \"svd\". Dimensions: solver: str, default=\"svd\" Categorical([\"svd\", \"lsqr\", \"eigen\"], name=\"solver\") shrinkage: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"shrinkage\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lda.plot_permutation_importance() or atom.lda.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LDA\")","title":"Linear Discriminant Analysis"},{"location":"API/models/lda/#linear-discriminant-analysis-lda","text":"Linear Discriminant Analysis is a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: LinearDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation .","title":"Linear Discriminant Analysis (LDA)"},{"location":"API/models/lda/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The shrinkage parameter is not used when solver = \"svd\". Dimensions: solver: str, default=\"svd\" Categorical([\"svd\", \"lsqr\", \"eigen\"], name=\"solver\") shrinkage: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"shrinkage\")","title":"Hyperparameters"},{"location":"API/models/lda/#attributes","text":"","title":"Attributes"},{"location":"API/models/lda/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lda/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lda/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lda/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lda.plot_permutation_importance() or atom.lda.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lda/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LDA\")","title":"Example"},{"location":"API/models/lgb/","text":"LightGBM (LGB) LightGBM is a gradient boosting model that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Capable of handling large-scale data. Corresponding estimators are: LGBMClassifier for classification tasks. LGBMRegressor for regression tasks. Read more in LightGBM's documentation . Note LightGBM allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=-1 Categorical([-1, \\*list(range(1, 10))], name=\"max_depth\") num_leaves: int, default=31 Integer(20, 40, name=\"num_leaves\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") min_child_samples: int, default=20 Integer(10, 30, name=\"min_child_samples\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"LGB\", metric=\"r2\", n_calls=50, bo_params={\"base_estimator\": \"ET\"})","title":"LightGBM"},{"location":"API/models/lgb/#lightgbm-lgb","text":"LightGBM is a gradient boosting model that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Capable of handling large-scale data. Corresponding estimators are: LGBMClassifier for classification tasks. LGBMRegressor for regression tasks. Read more in LightGBM's documentation . Note LightGBM allows early stopping to stop the training of unpromising models prematurely!","title":"LightGBM (LGB)"},{"location":"API/models/lgb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=-1 Categorical([-1, \\*list(range(1, 10))], name=\"max_depth\") num_leaves: int, default=31 Integer(20, 40, name=\"num_leaves\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") min_child_samples: int, default=20 Integer(10, 30, name=\"min_child_samples\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_level: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_level\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/lgb/#attributes","text":"","title":"Attributes"},{"location":"API/models/lgb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lgb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lgb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lgb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lgb.plot_permutation_importance() or atom.lgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lgb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"LGB\", metric=\"r2\", n_calls=50, bo_params={\"base_estimator\": \"ET\"})","title":"Example"},{"location":"API/models/lr/","text":"Logistic regression (LR) Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. Corresponding estimators are: LogisticRegression for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is automatically set to \"l2\" when penalty = \"none\" and solver = \"liblinear\". The penalty parameter is automatically set to \"l2\" when penalty = \"l1\" and solver != \"liblinear\" or \"saga\". The penalty parameter is automatically set to \"l2\" when penalty = \"elasticnet\" and solver != \"saga\". The C parameter is not used when penalty = \"none\". The l1_ratio parameter is only used when penalty = \"elasticnet\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") solver: str, default=\"lbfgs\" Categorical([\"lbfgs\", \"newton-cg\", \"liblinear\", \"sag\", \"saga\"], name=\"solver\") max_iter: int, default=100 Integer(100, 1000, name=\"max_iter\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lr.plot_permutation_importance() or atom.lr.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LR\")","title":"Logistic Regression"},{"location":"API/models/lr/#logistic-regression-lr","text":"Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. Corresponding estimators are: LogisticRegression for classification tasks. Read more in sklearn's documentation .","title":"Logistic regression (LR)"},{"location":"API/models/lr/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is automatically set to \"l2\" when penalty = \"none\" and solver = \"liblinear\". The penalty parameter is automatically set to \"l2\" when penalty = \"l1\" and solver != \"liblinear\" or \"saga\". The penalty parameter is automatically set to \"l2\" when penalty = \"elasticnet\" and solver != \"saga\". The C parameter is not used when penalty = \"none\". The l1_ratio parameter is only used when penalty = \"elasticnet\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") solver: str, default=\"lbfgs\" Categorical([\"lbfgs\", \"newton-cg\", \"liblinear\", \"sag\", \"saga\"], name=\"solver\") max_iter: int, default=100 Integer(100, 1000, name=\"max_iter\") l1_ratio: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\")","title":"Hyperparameters"},{"location":"API/models/lr/#attributes","text":"","title":"Attributes"},{"location":"API/models/lr/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lr/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lr/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lr/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.lr.plot_permutation_importance() or atom.lr.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lr/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"LR\")","title":"Example"},{"location":"API/models/lsvm/","text":"Linear-SVM (lSVM) Similar to Kernel-SVM but with a linear kernel. Implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. The multiclass support is handled according to a one-vs-rest scheme. Corresponding estimators are: LinearSVC for classification tasks. LinearSVR for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is only used with LinearSVC. The penalty parameter is always set to \"l2\" when loss = \"hinge\". The dual parameter is automatically set to False when penalty = \"l1\" and loss = \"squared_hinge\". The random_state parameter is set equal to that of the training instance. Dimensions: loss: str classifier: default=\"squared_hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") penalty: str, default=\"l2\" Categorical([\"l1\", \"l2\"], name=\"penalty\"). Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"lSVM\", metric=\"accuracy\", n_calls=10)","title":"Linear-SVM"},{"location":"API/models/lsvm/#linear-svm-lsvm","text":"Similar to Kernel-SVM but with a linear kernel. Implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. The multiclass support is handled according to a one-vs-rest scheme. Corresponding estimators are: LinearSVC for classification tasks. LinearSVR for regression tasks. Read more in sklearn's documentation .","title":"Linear-SVM (lSVM)"},{"location":"API/models/lsvm/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The penalty parameter is only used with LinearSVC. The penalty parameter is always set to \"l2\" when loss = \"hinge\". The dual parameter is automatically set to False when penalty = \"l1\" and loss = \"squared_hinge\". The random_state parameter is set equal to that of the training instance. Dimensions: loss: str classifier: default=\"squared_hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") penalty: str, default=\"l2\" Categorical([\"l1\", \"l2\"], name=\"penalty\").","title":"Hyperparameters"},{"location":"API/models/lsvm/#attributes","text":"","title":"Attributes"},{"location":"API/models/lsvm/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/lsvm/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/lsvm/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/lsvm/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/lsvm/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"lSVM\", metric=\"accuracy\", n_calls=10)","title":"Example"},{"location":"API/models/mlp/","text":"Multi-layer Perceptron (MLP) Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function by training on a dataset. Given a set of features and a target, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Corresponding estimators are: MLPClassifier for classification tasks. MLPRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The MLP optimizes between one and three hidden layers with the BO. For more layers, use est_params . The learning_rate and power_t parameters are only used when solver = \"lbfgs\". The learning_rate_init parameter is only used when solver != \"lbfgs\". The random_state parameter is set equal to that of the trainer. Dimensions: hidden_layer_sizes: tuple, default=(100,) Integer(10, 100, name=\"hidden_layer_1\") Integer(0, 100, name=\"hidden_layer_2\") Integer(0, 100, name=\"hidden_layer_3\") activation: str, default=\"relu\" Categorical([\"identity\", \"logistic\", \"tanh\", \"relu\"], name=\"activation\") solver: str, default=\"adam\" Categorical([\"lbfgs\", \"sgd\", \"adam\"], name=\"solver\") alpha: float, default=1e-4 Real(1e-4, 0.1, \"log-uniform\", name=\"alpha\") batch_size: int, default=200 Integer(8, 250, name=\"batch_size\") learning_rate: str, default=\"constant\" Categorical([\"constant\", \"invscaling\", \"adaptive\"], name=\"learning_rate\"). learning_rate_init: float, default=1e-3 Real(1e-3, 0.1, \"log-uniform\", name=\"learning_rate_init\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\"). max_iter: int, default=200 Integer(50, 500, name=\"max_iter\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"MLP\", n_calls=20, est_params={\"solver\": \"sgd\", \"activation\": \"relu\"})","title":"Multi-layer Perceptron"},{"location":"API/models/mlp/#multi-layer-perceptron-mlp","text":"Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function by training on a dataset. Given a set of features and a target, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Corresponding estimators are: MLPClassifier for classification tasks. MLPRegressor for regression tasks. Read more in sklearn's documentation .","title":"Multi-layer Perceptron (MLP)"},{"location":"API/models/mlp/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The MLP optimizes between one and three hidden layers with the BO. For more layers, use est_params . The learning_rate and power_t parameters are only used when solver = \"lbfgs\". The learning_rate_init parameter is only used when solver != \"lbfgs\". The random_state parameter is set equal to that of the trainer. Dimensions: hidden_layer_sizes: tuple, default=(100,) Integer(10, 100, name=\"hidden_layer_1\") Integer(0, 100, name=\"hidden_layer_2\") Integer(0, 100, name=\"hidden_layer_3\") activation: str, default=\"relu\" Categorical([\"identity\", \"logistic\", \"tanh\", \"relu\"], name=\"activation\") solver: str, default=\"adam\" Categorical([\"lbfgs\", \"sgd\", \"adam\"], name=\"solver\") alpha: float, default=1e-4 Real(1e-4, 0.1, \"log-uniform\", name=\"alpha\") batch_size: int, default=200 Integer(8, 250, name=\"batch_size\") learning_rate: str, default=\"constant\" Categorical([\"constant\", \"invscaling\", \"adaptive\"], name=\"learning_rate\"). learning_rate_init: float, default=1e-3 Real(1e-3, 0.1, \"log-uniform\", name=\"learning_rate_init\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\"). max_iter: int, default=200 Integer(50, 500, name=\"max_iter\")","title":"Hyperparameters"},{"location":"API/models/mlp/#attributes","text":"","title":"Attributes"},{"location":"API/models/mlp/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/mlp/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/mlp/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/mlp/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.mlp.plot_permutation_importance() or atom.mlp.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/mlp/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"MLP\", n_calls=20, est_params={\"solver\": \"sgd\", \"activation\": \"relu\"})","title":"Example"},{"location":"API/models/mnb/","text":"Multinomial Naive Bayes (MNB) Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially distributed data, and is one of the two classic Naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). Corresponding estimators are: MultinomialNB for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"MNB\", metric=\"precision\")","title":"Multinomial Naive Bayes"},{"location":"API/models/mnb/#multinomial-naive-bayes-mnb","text":"Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially distributed data, and is one of the two classic Naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). Corresponding estimators are: MultinomialNB for classification tasks. Read more in sklearn's documentation .","title":"Multinomial Naive Bayes (MNB)"},{"location":"API/models/mnb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") fit_prior: bool, default=True Categorical([True, False], name=\"fit_prior\")","title":"Hyperparameters"},{"location":"API/models/mnb/#attributes","text":"","title":"Attributes"},{"location":"API/models/mnb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/mnb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/mnb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/mnb/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.mnb.plot_permutation_importance() or atom.mnb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/mnb/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"MNB\", metric=\"precision\")","title":"Example"},{"location":"API/models/ols/","text":"Ordinary Least Squares (OLS) Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w = (w1, \u2026, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Corresponding estimators are: LinearRegression for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. OLS has no parameters to tune with the BO. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's [SCORERS](https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"OLS\")","title":"Ordinary Least Squares"},{"location":"API/models/ols/#ordinary-least-squares-ols","text":"Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w = (w1, \u2026, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Corresponding estimators are: LinearRegression for regression tasks. Read more in sklearn's documentation .","title":"Ordinary Least Squares (OLS)"},{"location":"API/models/ols/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs parameter is set equal to that of the trainer. OLS has no parameters to tune with the BO.","title":"Hyperparameters"},{"location":"API/models/ols/#attributes","text":"","title":"Attributes"},{"location":"API/models/ols/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ols/#utility-attributes","text":"Attributes: estimator: class Estimator instance fitted on the complete training set. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: name: Name of the model. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ols/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ols/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X) . The remaining utility methods can be found hereunder: delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's [SCORERS](https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ols/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"OLS\")","title":"Example"},{"location":"API/models/pa/","text":"Passive Aggressive (PA) The passive-aggressive algorithms are a family of algorithms for large-scale learning. They are similar to the Perceptron in that they do not require a learning rate. However, contrary to the Perceptron, they include a regularization parameter C. Corresponding estimators are: PassiveAggressiveClassifier for classification tasks. PassiveAggressiveRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") average: float, default=False Categorical([True, False], name=\"average\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"PA\", metric=\"f1\")","title":"Passive Aggressive"},{"location":"API/models/pa/#passive-aggressive-pa","text":"The passive-aggressive algorithms are a family of algorithms for large-scale learning. They are similar to the Perceptron in that they do not require a learning rate. However, contrary to the Perceptron, they include a regularization parameter C. Corresponding estimators are: PassiveAggressiveClassifier for classification tasks. PassiveAggressiveRegressor for regression tasks. Read more in sklearn's documentation .","title":"Passive Aggressive (PA)"},{"location":"API/models/pa/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: C: float, default=1.0 Real(1e-3, 100, \"log-uniform\", name=\"C\") loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"squared_hinge\"], name=\"loss\") regressor: default=\"epsilon_insensitive\" Categorical([\"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") average: float, default=False Categorical([True, False], name=\"average\")","title":"Hyperparameters"},{"location":"API/models/pa/#attributes","text":"","title":"Attributes"},{"location":"API/models/pa/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/pa/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/pa/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/pa/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.pa.plot_permutation_importance() or atom.pa.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/pa/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"PA\", metric=\"f1\")","title":"Example"},{"location":"API/models/qda/","text":"Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: QuadraticDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: reg_param: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"reg_param\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.qda.plot_permutation_importance() or atom.qda.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"QDA\")","title":"Quadratic Discriminant Analysis"},{"location":"API/models/qda/#quadratic-discriminant-analysis-qda","text":"Linear Discriminant Analysis is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes\u2019 rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. Corresponding estimators are: QuadraticDiscriminantAnalysis for classification tasks. Read more in sklearn's documentation .","title":"Quadratic Discriminant Analysis (QDA)"},{"location":"API/models/qda/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. Dimensions: reg_param: float, default=0 Categorical(np.linspace(0.0, 1.0, 11), name=\"reg_param\")","title":"Hyperparameters"},{"location":"API/models/qda/#attributes","text":"","title":"Attributes"},{"location":"API/models/qda/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/qda/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/qda/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set. decision_function_test: np.ndarray Decision function scores on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/qda/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.qda.plot_permutation_importance() or atom.qda.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/qda/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"QDA\")","title":"Example"},{"location":"API/models/rf/","text":"Random Forest (RF) Random forests are an ensemble learning method that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees\" habit of overfitting to their training set. Corresponding estimators are: RandomForestClassifier for classification tasks. RandomForestRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"RF\", metric=\"mae\", n_calls=20, n_initial_points=10)","title":"Random Forest"},{"location":"API/models/rf/#random-forest-rf","text":"Random forests are an ensemble learning method that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees\" habit of overfitting to their training set. Corresponding estimators are: RandomForestClassifier for classification tasks. RandomForestRegressor for regression tasks. Read more in sklearn's documentation .","title":"Random Forest (RF)"},{"location":"API/models/rf/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The max_samples parameter is only used when bootstrap = True. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(10, 500, name=\"n_estimators\") criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") bootstrap: bool, default=False Categorical([True, False], name=\"bootstrap\") max_samples: float, default=0.9 Categorical(np.linspace(0.5, 0.9, 5), name=\"max_samples\") results: pd.DataFrame Dataframe of the training results with the model acronym as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Hyperparameters"},{"location":"API/models/rf/#attributes","text":"","title":"Attributes"},{"location":"API/models/rf/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/rf/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs.","title":"Utility attributes"},{"location":"API/models/rf/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/rf/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rf.plot_permutation_importance() or atom.rf.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/rf/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"RF\", metric=\"mae\", n_calls=20, n_initial_points=10)","title":"Example"},{"location":"API/models/ridge/","text":"Ridge Classification/Regression (Ridge) Linear least squares with l2 regularization. Corresponding estimators are: RidgeClassifier for classification tasks. Ridge for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") solver: str, default=\"auto\" Categorical([\"auto\", \"svd\", \"cholesky\", \"lsqr\", \"sparse_cg\", \"sag\", \"saga\"], name=\"solver\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() or atom.ridge.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Ridge\")","title":"Ridge"},{"location":"API/models/ridge/#ridge-classificationregression-ridge","text":"Linear least squares with l2 regularization. Corresponding estimators are: RidgeClassifier for classification tasks. Ridge for regression tasks. Read more in sklearn's documentation .","title":"Ridge Classification/Regression (Ridge)"},{"location":"API/models/ridge/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: alpha: float, default=1.0 Real(1e-3, 10, \"log-uniform\", name=\"alpha\") solver: str, default=\"auto\" Categorical([\"auto\", \"svd\", \"cholesky\", \"lsqr\", \"sparse_cg\", \"sag\", \"saga\"], name=\"solver\")","title":"Hyperparameters"},{"location":"API/models/ridge/#attributes","text":"","title":"Attributes"},{"location":"API/models/ridge/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/ridge/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the estimator. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/ridge/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set. predict_proba_test: np.ndarray Predicted probabilities of the model on the test set. predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set. predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set. score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/ridge/#methods","text":"The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ridge.plot_permutation_importance() or atom.ridge.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/ridge/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Ridge\")","title":"Example"},{"location":"API/models/rnn/","text":"Radius Nearest Neighbors (RNN) Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors are selected from within a given radius. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: RadiusNeighborsClassifier for classification tasks. RadiusNeighborsRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The outlier_label parameter is set by default to \"most_frequent\" to avoid errors when encountering outliers. The n_jobs parameter is set equal to that of the trainer. Dimensions: radius: float, default=mean(distances) Real(min(distances), max(distances), name=\"radius\") Since the optimal radius depends hugely on the data, ATOM's RNN implementation doesn't use sklearn's default radius of 1, but instead calculates the [minkowsky distance](https://en.wikipedia.org/wiki/Minkowski_distance) between 10% of random samples in the training set and uses the mean of those distances as default radius. The lower and upper bounds of the radius' dimensions for the BO are given by the minimum and maximum value of the calculated distances. weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() or atom.rnn.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"RNN\", metric=\"precision\", est_params={\"radius\": 3.5})","title":"Radius Nearest Neighbors"},{"location":"API/models/rnn/#radius-nearest-neighbors-rnn","text":"Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors are selected from within a given radius. For regression, the target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Corresponding estimators are: RadiusNeighborsClassifier for classification tasks. RadiusNeighborsRegressor for regression tasks. Read more in sklearn's documentation .","title":"Radius Nearest Neighbors (RNN)"},{"location":"API/models/rnn/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The outlier_label parameter is set by default to \"most_frequent\" to avoid errors when encountering outliers. The n_jobs parameter is set equal to that of the trainer. Dimensions: radius: float, default=mean(distances) Real(min(distances), max(distances), name=\"radius\") Since the optimal radius depends hugely on the data, ATOM's RNN implementation doesn't use sklearn's default radius of 1, but instead calculates the [minkowsky distance](https://en.wikipedia.org/wiki/Minkowski_distance) between 10% of random samples in the training set and uses the mean of those distances as default radius. The lower and upper bounds of the radius' dimensions for the BO are given by the minimum and maximum value of the calculated distances. weights: str, default=\"uniform\" Categorical([\"uniform\", \"distance\"], name=\"weights\") algorithm: str, default=\"auto\" Categorical([\"auto\", \"ball_tree\", \"kd_tree\", \"brute\"], name=\"algorithm\") leaf_size: int, default=30 Integer(20, 40, name=\"leaf_size\") p: int, default=2 Integer(1, 2, name=\"p\")","title":"Hyperparameters"},{"location":"API/models/rnn/#attributes","text":"","title":"Attributes"},{"location":"API/models/rnn/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/rnn/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/rnn/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/rnn/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.rnn.plot_permutation_importance() or atom.rnn.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/rnn/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"RNN\", metric=\"precision\", est_params={\"radius\": 3.5})","title":"Example"},{"location":"API/models/sgd/","text":"Stochastic Gradient Descent (SGD) Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. Corresponding estimators are: SGDClassifier for classification tasks. SGDRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The l1_ratio parameter is only used when penalty = \"elasticnet\". The eta0 parameter is only used when learning_rate != \"optimal\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"log\", \"modified_huber\", \"squared_hinge\", \"perceptron\"], name=\"loss\") regressor: default=\"squared_loss\" Categorical([\"squared_loss\", \"huber\", \"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") alpha: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.15 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\"). epsilon: float, default=0.1 Real(1e-4, 1.0, \"log-uniform\", name=\"epsilon\") learning_rate: str, default=\"optimal\" Categorical([\"constant\", \"invscaling\", \"optimal\", \"adaptive\"], name = \"learning_rate\") eta0: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"eta0\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\") average: bool, default=False Categorical([True, False], name=\"average\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"SGD\", metric=\"recall\", bo_params={\"cv\": 3})","title":"Stochastic Gradient Descent"},{"location":"API/models/sgd/#stochastic-gradient-descent-sgd","text":"Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. Corresponding estimators are: SGDClassifier for classification tasks. SGDRegressor for regression tasks. Read more in sklearn's documentation .","title":"Stochastic Gradient Descent (SGD)"},{"location":"API/models/sgd/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The l1_ratio parameter is only used when penalty = \"elasticnet\". The eta0 parameter is only used when learning_rate != \"optimal\". The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: loss: str classifier: default=\"hinge\" Categorical([\"hinge\", \"log\", \"modified_huber\", \"squared_hinge\", \"perceptron\"], name=\"loss\") regressor: default=\"squared_loss\" Categorical([\"squared_loss\", \"huber\", \"epsilon_insensitive\", \"squared_epsilon_insensitive\"], name=\"loss\") penalty: str, default=\"l2\" Categorical([\"none\", \"l1\", \"l2\", \"elasticnet\"], name=\"penalty\") alpha: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"alpha\") l1_ratio: float, default=0.15 Categorical(np.linspace(0.1, 0.9, 9), name=\"l1_ratio\"). epsilon: float, default=0.1 Real(1e-4, 1.0, \"log-uniform\", name=\"epsilon\") learning_rate: str, default=\"optimal\" Categorical([\"constant\", \"invscaling\", \"optimal\", \"adaptive\"], name = \"learning_rate\") eta0: float, default=1e-4 Real(1e-4, 1.0, \"log-uniform\", name=\"eta0\"). power_t: float, default=0.5 Categorical(np.linspace(0.1, 0.9, 9), name=\"power_t\") average: bool, default=False Categorical([True, False], name=\"average\")","title":"Hyperparameters"},{"location":"API/models/sgd/#attributes","text":"","title":"Attributes"},{"location":"API/models/sgd/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/sgd/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/sgd/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. decision_function_train: np.ndarray Decision function scores on the training set (only if classifier). decision_function_test: np.ndarray Decision function scores on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/sgd/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.sgd.plot_permutation_importance() or atom.sgd.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/sgd/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(models=\"SGD\", metric=\"recall\", bo_params={\"cv\": 3})","title":"Example"},{"location":"API/models/tree/","text":"Decision Tree (Tree) A single decision tree classifier/regressor. Corresponding estimators are: DecisionTreeClassifier for classification tasks. DecisionTreeRegressor for regression tasks. Read more in sklearn's documentation . Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") splitter: str, default=\"best\" Categorical([\"best\", \"random\"], name=\"splitter\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0.0 Real(0, 0.035, name=\"ccp_alpha\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.tree.plot_permutation_importance() or atom.tree.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Tree\", metric=\"MSLE\")","title":"Decision Tree"},{"location":"API/models/tree/#decision-tree-tree","text":"A single decision tree classifier/regressor. Corresponding estimators are: DecisionTreeClassifier for classification tasks. DecisionTreeRegressor for regression tasks. Read more in sklearn's documentation .","title":"Decision Tree (Tree)"},{"location":"API/models/tree/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The random_state parameter is set equal to that of the trainer. Dimensions: criterion: str classifier: default=\"gini\" Categorical([\"gini\", \"entropy\"], name=\"criterion\") regressor: default=\"mse\" Categorical([\"mse\", \"mae\", \"friedman_mse\"], name=\"criterion\") splitter: str, default=\"best\" Categorical([\"best\", \"random\"], name=\"splitter\") max_depth: int or None, default=None Categorical([None, \\*list(range(1, 10))], name=\"max_depth\") min_samples_split: int, default=2 Integer(2, 20, name=\"min_samples_split\") min_samples_leaf: int, default=1 Integer(1, 20, name=\"min_samples_leaf\") max_features: float or None, default=None Categorical([None, \\*np.linspace(0.5, 0.9, 5)], name=\"max_features\") ccp_alpha: float, default=0.0 Real(0, 0.035, name=\"ccp_alpha\")","title":"Hyperparameters"},{"location":"API/models/tree/#attributes","text":"","title":"Attributes"},{"location":"API/models/tree/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/tree/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/tree/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/tree/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.tree.plot_permutation_importance() or atom.tree.predict(X) .The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/tree/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"Tree\", metric=\"MSLE\")","title":"Example"},{"location":"API/models/xgb/","text":"XGBoost (XGB) XGBoost is an optimized distributed gradient boosting model designed to be highly efficient, flexible and portable. XGBoost provides a parallel tree boosting that solve many data science problems in a fast and accurate way. Corresponding estimators are: XGBClassifier for classification tasks. XGBRegressor for regression tasks. Read more in XGBoost's documentation . Note XGBoost allows early stopping to stop the training of unpromising models prematurely! Hyperparameters By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=6 Integer(1, 10, name=\"max_depth\") gamma: float, default=0.0 Real(0, 1.0, name=\"gamma\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_tree: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_tree\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=1.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\") Attributes Data attributes Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Prediction attributes The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set. Methods The majority of the plots and prediction methods can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"XGB\", metric=\"me\", n_calls=25, bo_params={\"cv\": 1})","title":"XGBoost"},{"location":"API/models/xgb/#xgboost-xgb","text":"XGBoost is an optimized distributed gradient boosting model designed to be highly efficient, flexible and portable. XGBoost provides a parallel tree boosting that solve many data science problems in a fast and accurate way. Corresponding estimators are: XGBClassifier for classification tasks. XGBRegressor for regression tasks. Read more in XGBoost's documentation . Note XGBoost allows early stopping to stop the training of unpromising models prematurely!","title":"XGBoost (XGB)"},{"location":"API/models/xgb/#hyperparameters","text":"By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them. The n_jobs and random_state parameters are set equal to those of the trainer. Dimensions: n_estimators: int, default=100 Integer(20, 500, name=\"n_estimators\") learning_rate: float, default=0.1 Real(0.01, 1.0, \"log-uniform\", name=\"learning_rate\") max_depth: int, default=6 Integer(1, 10, name=\"max_depth\") gamma: float, default=0.0 Real(0, 1.0, name=\"gamma\") min_child_weight: int, default=1 Integer(1, 20, name=\"min_child_weight\") subsample: float, default=1.0 Categorical(np.linspace(0.5, 1.0, 6), name=\"subsample\") colsample_by_tree: float, default=1.0 Categorical(np.linspace(0.3, 1.0, 8), name=\"colsample_by_tree\") reg_alpha: float, default=0.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_alpha\") reg_lambda: float, default=1.0 Categorical([0, 0.01, 0.1, 1, 10, 100], name=\"reg_lambda\")","title":"Hyperparameters"},{"location":"API/models/xgb/#attributes","text":"","title":"Attributes"},{"location":"API/models/xgb/#data-attributes","text":"Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/models/xgb/#utility-attributes","text":"Attributes: bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include: \"params\": Parameters used in the model. \"estimator\": Estimator used for this iteration (fitted on last cross-validation). \"score\": Score of the chosen metric. List of scores for multi-metric. \"time_iteration\": Time spent on this iteration. \"time\": Total time spent since the start of the BO. best_params: dict Dictionary of the best combination of hyperparameters found by the BO. estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set. time_bo: str Time it took to run the bayesian optimization algorithm. metric_bo: float or list Best metric score(s) on the BO. time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set. metric_train: float or list Metric score(s) on the training set. metric_test: float or list Metric score(s) on the test set. evals: dict Dictionary of the metric calculated during training. The metric is provided by the estimator's package and is different for every task. Available keys are: \"metric\": Name of the metric. \"train\": List of scores calculated on the training set. \"test\": List of scores calculated on the test set. metric_bagging: list Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. mean_bagging: float or list Mean of the bagging's results. List of values for multi-metric runs. std_bagging: float or list Standard deviation of the bagging's results. List of values for multi-metric runs. results: pd.Series Series of the training results. Columns include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/models/xgb/#prediction-attributes","text":"The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory. Prediction attributes: predict_train: np.ndarray Predictions of the model on the training set. predict_test: np.ndarray Predictions of the model on the test set. predict_proba_train: np.ndarray Predicted probabilities of the model on the training set (only if classifier). predict_proba_test: np.ndarray Predicted probabilities of the model on the test set (only if classifier). predict_log_proba_train: np.ndarray Predicted log probabilities of the model on the training set (only if classifier). predict_log_proba_test: np.ndarray Predicted log probabilities of the model on the test set (only if classifier). score_train: np.float64 Model's score on the training set. score_test: np.float64 Model's score on the test set.","title":"Prediction attributes"},{"location":"API/models/xgb/#methods","text":"The majority of the plots and prediction methods can be called directly from the models, e.g. atom.xgb.plot_permutation_importance() or atom.xgb.predict(X) . The remaining utility methods can be found hereunder: calibrate Calibrate the model. delete Delete the model from the trainer. rename Change the model's tag. reset_predictions Clear all the prediction attributes. scoring Get the score for a specific metric. save_estimator Save the estimator to a pickle file. method calibrate (**kwargs) [source] Applies probability calibration on the estimator. The calibration is done using the CalibratedClassifierCV class from sklearn. The calibrator is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes will reset. Only if classifier. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method delete () [source] Delete the model from the trainer. method rename (name=None) [source] Change the model's tag. Note that the acronym always stays at the beginning of the model's name. Parameters: name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. method reset_predictions () [source] Clear all the prediction attributes . Use this method to free some memory before saving the model. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Get the scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's SCORERS or one of the following custom metrics (only if classifier): \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the final results for this model (ignores the dataset parameter). dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\" or \"test\". **kwargs Additional keyword arguments for the metric function. Returns: score: float or np.ndarray Model's score for the selected metric. method save_estimator (filename=None) [source] Save the estimator to a pickle file. Parameters: filename: str or None, optional (default=None) Name of the file to save. If None or \"auto\", the estimator's __name__ is used.","title":"Methods"},{"location":"API/models/xgb/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(models=\"XGB\", metric=\"me\", n_calls=25, bo_params={\"cv\": 1})","title":"Example"},{"location":"API/plots/bar_plot/","text":"bar_plot method bar_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature column is plotted. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's bar plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.bar_plot() atom.bar_plot(index=120)","title":"bar_plot"},{"location":"API/plots/bar_plot/#bar_plot","text":"method bar_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature column is plotted. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's bar plot .","title":"bar_plot"},{"location":"API/plots/bar_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.bar_plot() atom.bar_plot(index=120)","title":"Example"},{"location":"API/plots/beeswarm_plot/","text":"beeswarm_plot method beeswarm_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The beeswarm plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's beeswarm plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.beeswarm_plot()","title":"beeswarm_plot"},{"location":"API/plots/beeswarm_plot/#beeswarm_plot","text":"method beeswarm_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The beeswarm plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's beeswarm plot .","title":"beeswarm_plot"},{"location":"API/plots/beeswarm_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.beeswarm_plot()","title":"Example"},{"location":"API/plots/decision_plot/","text":"decision_plot method decision_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple predictions are plotted together, feature values will not be printed. Plotting too many predictions together will make the plot unintelligible. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, select all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's decision plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.decision_plot() atom.decision_plot(index=120)","title":"decision_plot"},{"location":"API/plots/decision_plot/#decision_plot","text":"method decision_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple predictions are plotted together, feature values will not be printed. Plotting too many predictions together will make the plot unintelligible. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, select all rows in the test set. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's decision plot .","title":"decision_plot"},{"location":"API/plots/decision_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.decision_plot() atom.decision_plot(index=120)","title":"Example"},{"location":"API/plots/force_plot/","text":"force_plot method force_plot (models=None, index=None, target=1, title=None, figsize=(14, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use matplotlib=True (this option is only available when only a single sample is plotted). Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(14, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If matplotlib=False, the figure is saved as an html file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's force plot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"lr\") atom.force_plot(index=120, matplotlib=True, filename=\"force_plot\")","title":"force_plot"},{"location":"API/plots/force_plot/#force_plot","text":"method force_plot (models=None, index=None, target=1, title=None, figsize=(14, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use matplotlib=True (this option is only available when only a single sample is plotted). Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int, tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(14, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If matplotlib=False, the figure is saved as an html file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's force plot .","title":"force_plot"},{"location":"API/plots/force_plot/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"lr\") atom.force_plot(index=120, matplotlib=True, filename=\"force_plot\")","title":"Example"},{"location":"API/plots/heatmap_plot/","text":"heatmap_plot method heatmap_plot (models=None, index=None, show=None, target=1, title=None, figsize=(8, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original feature values but by their explanations. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The heatmap plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's heatmap plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.heatmap_plot()","title":"heatmap_plot"},{"location":"API/plots/heatmap_plot/#heatmap_plot","text":"method heatmap_plot (models=None, index=None, show=None, target=1, title=None, figsize=(8, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original feature values but by their explanations. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The heatmap plot does not support plotting a single sample. show: int or None, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's heatmap plot .","title":"heatmap_plot"},{"location":"API/plots/heatmap_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.heatmap_plot()","title":"Example"},{"location":"API/plots/plot_bo/","text":"plot_bo method plot_bo (models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True) [source] Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by bo_params={\"plot_bo\": True} while running the optimization. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline that used bayesian optimization are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 8)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LDA\", \"LGB\"], metric=\"f1\", n_calls=24, n_initial_points=10) atom.plot_bo()","title":"plot_bo"},{"location":"API/plots/plot_bo/#plot_bo","text":"method plot_bo (models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True) [source] Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by bo_params={\"plot_bo\": True} while running the optimization. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline that used bayesian optimization are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 8)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_bo"},{"location":"API/plots/plot_bo/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LDA\", \"LGB\"], metric=\"f1\", n_calls=24, n_initial_points=10) atom.plot_bo()","title":"Example"},{"location":"API/plots/plot_calibration/","text":"plot_calibration method plot_calibration (models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True) [source] Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approx. 80% actually belong to the positive class. Read more in sklearn's documentation . This figure shows two plots: the calibration curve, where the x-axis represents the average predicted probability in each bin and the y-axis is the fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin); and a distribution of all predicted probabilities of the classifier. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. n_bins: int, optional (default=10) Number of bins for the calibration calculation and the histogram. Minimum of 5 required. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"LR\", \"LGB\"], metric=\"average_precision\") atom.plot_calibration()","title":"plot_calibration"},{"location":"API/plots/plot_calibration/#plot_calibration","text":"method plot_calibration (models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True) [source] Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approx. 80% actually belong to the positive class. Read more in sklearn's documentation . This figure shows two plots: the calibration curve, where the x-axis represents the average predicted probability in each bin and the y-axis is the fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin); and a distribution of all predicted probabilities of the classifier. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. n_bins: int, optional (default=10) Number of bins for the calibration calculation and the histogram. Minimum of 5 required. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_calibration"},{"location":"API/plots/plot_calibration/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"LR\", \"LGB\"], metric=\"average_precision\") atom.plot_calibration()","title":"Example"},{"location":"API/plots/plot_components/","text":"plot_components method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. Only available if PCA was applied on the data. Parameters: show: int or None, optional (default=None) Number of components to show. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_components()","title":"plot_components"},{"location":"API/plots/plot_components/#plot_components","text":"method plot_components (show=None, title=None, figsize=None, filename=None, display=True) [source] Plot the explained variance ratio per components. Only available if PCA was applied on the data. Parameters: show: int or None, optional (default=None) Number of components to show. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_components"},{"location":"API/plots/plot_components/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_components()","title":"Example"},{"location":"API/plots/plot_confusion_matrix/","text":"plot_confusion_matrix method plot_confusion_matrix (models=None, dataset=\"test\", normalize=False, title=None, figsize=None, filename=None, display=True) [source] Plot a model's confusion matrix. Only for classification tasks. For 1 model: plot the confusion matrix in a heatmap. For multiple models: compare TP, FP, FN and TN in a barplot (not implemented for multiclass classification tasks). Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the confusion matrix. Options are \"train\" or \"test\". normalize: bool, optional (default=False) Whether to normalize the matrix. Only for the heatmap plot. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to plot type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"Bag\"]) atom.Tree.plot_confusion_matrix(normalize=True) atom.plot_confusion_matrix()","title":"plot_confusion_matrix"},{"location":"API/plots/plot_confusion_matrix/#plot_confusion_matrix","text":"method plot_confusion_matrix (models=None, dataset=\"test\", normalize=False, title=None, figsize=None, filename=None, display=True) [source] Plot a model's confusion matrix. Only for classification tasks. For 1 model: plot the confusion matrix in a heatmap. For multiple models: compare TP, FP, FN and TN in a barplot (not implemented for multiclass classification tasks). Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the confusion matrix. Options are \"train\" or \"test\". normalize: bool, optional (default=False) Whether to normalize the matrix. Only for the heatmap plot. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to plot type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_confusion_matrix"},{"location":"API/plots/plot_confusion_matrix/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"Bag\"]) atom.Tree.plot_confusion_matrix(normalize=True) atom.plot_confusion_matrix()","title":"Example"},{"location":"API/plots/plot_correlation/","text":"plot_correlation method plot_correlation (columns=None, method=\"pearson\", title=None, figsize=(8, 7), filename=None, display=True) [source] Plot the data's correlation matrix. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. method: str, optional (default=\"pearson\") Method of correlation. Choose from \"pearson\", \"kendall\" or \"spearman\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 7)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_correlation()","title":"plot_correlation"},{"location":"API/plots/plot_correlation/#plot_correlation","text":"method plot_correlation (columns=None, method=\"pearson\", title=None, figsize=(8, 7), filename=None, display=True) [source] Plot the data's correlation matrix. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. method: str, optional (default=\"pearson\") Method of correlation. Choose from \"pearson\", \"kendall\" or \"spearman\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(8, 7)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_correlation"},{"location":"API/plots/plot_correlation/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_correlation()","title":"Example"},{"location":"API/plots/plot_distribution/","text":"plot_distribution method plot_distribution (columns=0, distribution=None, show=None, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot column distributions. Additionally, it is possible to plot any of scipy.stats probability distributions fitted to the column. Missing values are ignored. Tip Use atom's distribution method to check which distribution fits the column best. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. It is only possible to plot one categorical column. If more than just the one categorical column is selected, all categorical columns are ignored. distribution: str, sequence or None, optional (default=None) Names of the scipy.stats distributions to fit to the column. If None, no distribution is fitted. Only for numerical columns. show: int or None, optional (default=None) Number of classes (ordered by number of occurrences) to show in the plot. None to show all. Only for categorical columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the plot's type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's histplot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_distribution(columns=[1, 2]) # With numerical columns atom.plot_distribution(columns=\"mean radius\", distribution=[\"norm\", \"triang\", \"pearson3\"]) # With fitted distributions atom.plot_distribution(columns=\"Location\", show=11) # With categorical columns","title":"plot_distribution"},{"location":"API/plots/plot_distribution/#plot_distribution","text":"method plot_distribution (columns=0, distribution=None, show=None, title=None, figsize=None, filename=None, display=True, **kwargs) [source] Plot column distributions. Additionally, it is possible to plot any of scipy.stats probability distributions fitted to the column. Missing values are ignored. Tip Use atom's distribution method to check which distribution fits the column best. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. It is only possible to plot one categorical column. If more than just the one categorical column is selected, all categorical columns are ignored. distribution: str, sequence or None, optional (default=None) Names of the scipy.stats distributions to fit to the column. If None, no distribution is fitted. Only for numerical columns. show: int or None, optional (default=None) Number of classes (ordered by number of occurrences) to show in the plot. None to show all. Only for categorical columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the plot's type. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's histplot .","title":"plot_distribution"},{"location":"API/plots/plot_distribution/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_distribution(columns=[1, 2]) # With numerical columns atom.plot_distribution(columns=\"mean radius\", distribution=[\"norm\", \"triang\", \"pearson3\"]) # With fitted distributions atom.plot_distribution(columns=\"Location\", show=11) # With categorical columns","title":"Example"},{"location":"API/plots/plot_errors/","text":"plot_errors method plot_errors (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect noise or heteroscedasticity along a range of the target domain. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the errors. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_errors()","title":"plot_errors"},{"location":"API/plots/plot_errors/#plot_errors","text":"method plot_errors (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect noise or heteroscedasticity along a range of the target domain. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the errors. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_errors"},{"location":"API/plots/plot_errors/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_errors()","title":"Example"},{"location":"API/plots/plot_evals/","text":"plot_evals method plot_evals (models=None, dataset=\"both\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation ( XGB , LGB , CatB ). The metric is provided by the estimator's package and is different for every model and every task. For this reason, the method only allows plotting one model at a time. Parameters: models: str, sequence or None, optional (default=None) Name of the model to plot. If None, all models in the pipeline are selected. Note that leaving the default option could raise an exception if there are multiple models in the pipeline. To avoid this, call the plot from a model, e.g. atom.lgb.plot_evals() . dataset: str, optional (default=\"both\") Data set on which to calculate the evaluation curves. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"Bag\", \"LGB\"]) atom.lgb.plot_evals()","title":"plot_evals"},{"location":"API/plots/plot_evals/#plot_evals","text":"method plot_evals (models=None, dataset=\"both\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation ( XGB , LGB , CatB ). The metric is provided by the estimator's package and is different for every model and every task. For this reason, the method only allows plotting one model at a time. Parameters: models: str, sequence or None, optional (default=None) Name of the model to plot. If None, all models in the pipeline are selected. Note that leaving the default option could raise an exception if there are multiple models in the pipeline. To avoid this, call the plot from a model, e.g. atom.lgb.plot_evals() . dataset: str, optional (default=\"both\") Data set on which to calculate the evaluation curves. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_evals"},{"location":"API/plots/plot_evals/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"Bag\", \"LGB\"]) atom.lgb.plot_evals()","title":"Example"},{"location":"API/plots/plot_feature_importance/","text":"plot_feature_importance method plot_feature_importance (models=None, show=None, title=None, figsize=None, filename=None, display=True) [source] Plot a tree-based model's feature importance. The importances are normalized in order to be able to compare them between models. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\"], metric=\"recall_weighted\") atom.RF.plot_feature_importance(show=11, filename=\"random_forest_importance\")","title":"plot_feature_importance"},{"location":"API/plots/plot_feature_importance/#plot_feature_importance","text":"method plot_feature_importance (models=None, show=None, title=None, figsize=None, filename=None, display=True) [source] Plot a tree-based model's feature importance. The importances are normalized in order to be able to compare them between models. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_feature_importance"},{"location":"API/plots/plot_feature_importance/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\"], metric=\"recall_weighted\") atom.RF.plot_feature_importance(show=11, filename=\"random_forest_importance\")","title":"Example"},{"location":"API/plots/plot_gains/","text":"plot_gains method plot_gains (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the cumulative gains curve. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the gains curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_gains(filename=\"cumulative_gains_curve\")","title":"plot_gains"},{"location":"API/plots/plot_gains/#plot_gains","text":"method plot_gains (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the cumulative gains curve. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the gains curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_gains"},{"location":"API/plots/plot_gains/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_gains(filename=\"cumulative_gains_curve\")","title":"Example"},{"location":"API/plots/plot_learning_curve/","text":"plot_learning_curve method plot_learning_curve (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using train sizing . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example import numpy as np from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.train_sizing([\"GNB\", \"LDA\"], metric=\"accuracy\", train_sizes=np.linspace(0.1, 1.0, 9), bagging=5) atom.plot_learning_curve()","title":"plot_learning_curve"},{"location":"API/plots/plot_learning_curve/#plot_learning_curve","text":"method plot_learning_curve (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using train sizing . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_learning_curve"},{"location":"API/plots/plot_learning_curve/#example","text":"import numpy as np from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.train_sizing([\"GNB\", \"LDA\"], metric=\"accuracy\", train_sizes=np.linspace(0.1, 1.0, 9), bagging=5) atom.plot_learning_curve()","title":"Example"},{"location":"API/plots/plot_lift/","text":"plot_lift method plot_lift (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the lift curve. Only for binary classification. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the lift curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_lift(filename=\"lift_curve\")","title":"plot_lift"},{"location":"API/plots/plot_lift/#plot_lift","text":"method plot_lift (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the lift curve. Only for binary classification. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the lift curve. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_lift"},{"location":"API/plots/plot_lift/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"GNB\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_lift(filename=\"lift_curve\")","title":"Example"},{"location":"API/plots/plot_partial_dependence/","text":"plot_partial_dependence method plot_partial_dependence (models=None, features=None, target=None, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the partial dependence of features. The partial dependence of a feature (or a set of features) corresponds to the average response of the model for each possible value of the feature. Two-way partial dependence plots are plotted as contour plots (only allowed for single model plots). The deciles of the feature values will be shown with tick marks on the x-axes for one-way plots, and on both axes for two-way plots. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. features: int, str, list, tuple or None, optional (default=None) Features or feature pairs (name or index) to get the partial dependence from. Maximum of 3 allowed. If None, it uses the top 3 features if the feature_importance attribute is defined else it uses the first 3 features in the dataset. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=6) atom.run([\"Tree\", \"Bag\"], metric=\"precision\") atom.plot_partial_dependence() atom.tree.plot_partial_dependence(features=[0, 1, (1, 3)])","title":"plot_partial_dependence"},{"location":"API/plots/plot_partial_dependence/#plot_partial_dependence","text":"method plot_partial_dependence (models=None, features=None, target=None, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the partial dependence of features. The partial dependence of a feature (or a set of features) corresponds to the average response of the model for each possible value of the feature. Two-way partial dependence plots are plotted as contour plots (only allowed for single model plots). The deciles of the feature values will be shown with tick marks on the x-axes for one-way plots, and on both axes for two-way plots. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. features: int, str, list, tuple or None, optional (default=None) Features or feature pairs (name or index) to get the partial dependence from. Maximum of 3 allowed. If None, it uses the top 3 features if the feature_importance attribute is defined else it uses the first 3 features in the dataset. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_partial_dependence"},{"location":"API/plots/plot_partial_dependence/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=6) atom.run([\"Tree\", \"Bag\"], metric=\"precision\") atom.plot_partial_dependence() atom.tree.plot_partial_dependence(features=[0, 1, (1, 3)])","title":"Example"},{"location":"API/plots/plot_pca/","text":"plot_pca method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. Only available if PCA was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_pca()","title":"plot_pca"},{"location":"API/plots/plot_pca/#plot_pca","text":"method plot_pca (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the explained variance ratio vs the number of components. Only available if PCA was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_pca"},{"location":"API/plots/plot_pca/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"PCA\", n_features=11) atom.plot_pca()","title":"Example"},{"location":"API/plots/plot_permutation_importance/","text":"plot_permutation_importance method plot_permutation_importance (models=None, show=None, n_repeats=10, title=None, figsize=None, filename=None, display=True) [source] Plot the feature permutation importance of models. Calculating all permutations can be time-consuming, especially if n_repeats is high. They are stored under the attribute permutations . This means that if a plot is repeated for the same model with the same n_repeats , it will be considerably faster. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. n_repeats: int, optional (default=10) Number of times to permute each feature. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"LDA\"], metric=\"average_precision\") atom.lda.plot_permutation_importance(show=10, n_repeats=7)","title":"plot_permutation_importance"},{"location":"API/plots/plot_permutation_importance/#plot_permutation_importance","text":"method plot_permutation_importance (models=None, show=None, n_repeats=10, title=None, figsize=None, filename=None, display=True) [source] Plot the feature permutation importance of models. Calculating all permutations can be time-consuming, especially if n_repeats is high. They are stored under the attribute permutations . This means that if a plot is repeated for the same model with the same n_repeats , it will be considerably faster. The feature_importance attribute is updated with the extracted importance ranking. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. show: int, optional (default=None) Number of features (ordered by importance) to show in the plot. None to show all. n_repeats: int, optional (default=10) Number of times to permute each feature. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_permutation_importance"},{"location":"API/plots/plot_permutation_importance/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"LDA\"], metric=\"average_precision\") atom.lda.plot_permutation_importance(show=10, n_repeats=7)","title":"Example"},{"location":"API/plots/plot_pipeline/","text":"plot_pipeline method plot_pipeline (show_params=True, branch=None, title=None, figsize=None, filename=None, display=True) [source] Plot a diagram of every estimator in a branch. Parameters: show_params: bool, optional (default=True) Whether to show the parameters used for every estimator. branch: str or None, optional (default=None) Name of the branch to plot. If None, plot the current branch. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the length of the pipeline. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"median\", strat_cat=\"drop\", min_frac_rows=0.8) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.02) atom.prune(strategy=\"drop\", max_sigma=4, include_target=False) atom.feature_selection( strategy=\"PCA\", n_features=10, max_frac_repeated=1., max_correlation=0.7 ) atom.plot_pipeline()","title":"plot_pipeline"},{"location":"API/plots/plot_pipeline/#plot_pipeline","text":"method plot_pipeline (show_params=True, branch=None, title=None, figsize=None, filename=None, display=True) [source] Plot a diagram of every estimator in a branch. Parameters: show_params: bool, optional (default=True) Whether to show the parameters used for every estimator. branch: str or None, optional (default=None) Name of the branch to plot. If None, plot the current branch. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the length of the pipeline. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_pipeline"},{"location":"API/plots/plot_pipeline/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.impute(strat_num=\"median\", strat_cat=\"drop\", min_frac_rows=0.8) atom.encode(strategy=\"LeaveOneOut\", max_onehot=8, frac_to_other=0.02) atom.prune(strategy=\"drop\", max_sigma=4, include_target=False) atom.feature_selection( strategy=\"PCA\", n_features=10, max_frac_repeated=1., max_correlation=0.7 ) atom.plot_pipeline()","title":"Example"},{"location":"API/plots/plot_prc/","text":"plot_prc method plot_prc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the precision-recall curve. The legend shows the average precision (AP) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"average_precision\") atom.plot_prc()","title":"plot_prc"},{"location":"API/plots/plot_prc/#plot_prc","text":"method plot_prc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the precision-recall curve. The legend shows the average precision (AP) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_prc"},{"location":"API/plots/plot_prc/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"average_precision\") atom.plot_prc()","title":"Example"},{"location":"API/plots/plot_probabilities/","text":"plot_probabilities method plot_probabilities (models=None, dataset=\"test\", target=1, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the probability distribution of the classes in the target column. Only for classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". target: int or str, optional (default=1) Probability of being that class in the target column as index or name. Only for multiclass classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y=\"RainTomorrow\") atom.run(\"rf\") atom.plot_probabilities()","title":"plot_probabilities"},{"location":"API/plots/plot_probabilities/#plot_probabilities","text":"method plot_probabilities (models=None, dataset=\"test\", target=1, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the probability distribution of the classes in the target column. Only for classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". target: int or str, optional (default=1) Probability of being that class in the target column as index or name. Only for multiclass classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_probabilities"},{"location":"API/plots/plot_probabilities/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y=\"RainTomorrow\") atom.run(\"rf\") atom.plot_probabilities()","title":"Example"},{"location":"API/plots/plot_qq/","text":"plot_qq method plot_qq (columns=0, distribution=\"norm\", title=None, figsize=None, filename=None, display=True) [source] Plot a quantile-quantile plot. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. Selected categorical columns are ignored. distribution: str, sequence or None, optional (default=\"norm\") Name of the scipy.stats distribution to fit to the columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_qq(columns=[0, 1], distribution=\"triang\")","title":"plot_qq"},{"location":"API/plots/plot_qq/#plot_qq","text":"method plot_qq (columns=0, distribution=\"norm\", title=None, figsize=None, filename=None, display=True) [source] Plot a quantile-quantile plot. Parameters: columns: int, str, slice or sequence, optional (default=0) Slice, names or indices of the columns to plot. Selected categorical columns are ignored. distribution: str, sequence or None, optional (default=\"norm\") Name of the scipy.stats distribution to fit to the columns. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_qq"},{"location":"API/plots/plot_qq/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_qq(columns=[0, 1], distribution=\"triang\")","title":"Example"},{"location":"API/plots/plot_residuals/","text":"plot_residuals method plot_residuals (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the horizontal axis. The gray, intersected line shows the identity line. This plot can be useful to analyze the variance of the error of the regressor. If the points are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_residuals()","title":"plot_residuals"},{"location":"API/plots/plot_residuals/#plot_residuals","text":"method plot_residuals (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the horizontal axis. The gray, intersected line shows the identity line. This plot can be useful to analyze the variance of the error of the regressor. If the points are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate. Only for regression tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_residuals"},{"location":"API/plots/plot_residuals/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run([\"OLS\", \"LGB\"], metric=\"MAE\") atom.plot_residuals()","title":"Example"},{"location":"API/plots/plot_results/","text":"plot_results method plot_results (models=None, metric=0, title=None, figsize=None, filename=None, display=True) [source] Plot of the model results after the evaluation. If all models applied bagging, the plot is a boxplot. If not, the plot is a barplot. Models are ordered based on their score from the top down. The score is either the mean_bagging or metric_test attribute of the model, selected in that order. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size the to number of models. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=5) atom.plot_results() # With bagging... # And without bagging... atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=0) atom.plot_results()","title":"plot_results"},{"location":"API/plots/plot_results/#plot_results","text":"method plot_results (models=None, metric=0, title=None, figsize=None, filename=None, display=True) [source] Plot of the model results after the evaluation. If all models applied bagging, the plot is a boxplot. If not, the plot is a barplot. Models are ordered based on their score from the top down. The score is either the mean_bagging or metric_test attribute of the model, selected in that order. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=None) Figure's size, format as (x, y). If None, adapts size the to number of models. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_results"},{"location":"API/plots/plot_results/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=5) atom.plot_results() # With bagging... # And without bagging... atom.run([\"QDA\", \"Tree\", \"RF\", \"ET\", \"LGB\"], metric=\"f1\", bagging=0) atom.plot_results()","title":"Example"},{"location":"API/plots/plot_rfecv/","text":"plot_rfecv method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every subset of the dataset. Only available if RFECV was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"RFECV\", solver=\"LGB\", scoring=\"precision\") atom.plot_rfecv()","title":"plot_rfecv"},{"location":"API/plots/plot_rfecv/#plot_rfecv","text":"method plot_rfecv (title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every subset of the dataset. Only available if RFECV was applied on the data. Parameters: title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_rfecv"},{"location":"API/plots/plot_rfecv/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.feature_selection(strategy=\"RFECV\", solver=\"LGB\", scoring=\"precision\") atom.plot_rfecv()","title":"Example"},{"location":"API/plots/plot_roc/","text":"plot_roc method plot_roc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the Receiver Operating Characteristics curve. The legend shows the Area Under the ROC Curve (AUC) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_roc(filename=\"roc_curve\")","title":"plot_roc"},{"location":"API/plots/plot_roc/#plot_roc","text":"method plot_roc (models=None, dataset=\"test\", title=None, figsize=(10, 6), filename=None, display=True) [source] Plot the Receiver Operating Characteristics curve. The legend shows the Area Under the ROC Curve (AUC) score. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_roc"},{"location":"API/plots/plot_roc/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"LR\", \"RF\", \"LGB\"], metric=\"roc_auc\") atom.plot_roc(filename=\"roc_curve\")","title":"Example"},{"location":"API/plots/plot_scatter_matrix/","text":"plot_scatter_matrix method plot_scatter_matrix (columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs) [source] Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's pairplot . Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_scatter_matrix(columns=slice(0, 5))","title":"plot_scatter_matrix"},{"location":"API/plots/plot_scatter_matrix/#plot_scatter_matrix","text":"method plot_scatter_matrix (columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs) [source] Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot. Parameters: columns: slice, sequence or None, optional (default=None) Slice, names or indices of the columns to plot. If None, plot all columns in the dataset. Selected categorical columns are ignored. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 10)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for seaborn's pairplot .","title":"plot_scatter_matrix"},{"location":"API/plots/plot_scatter_matrix/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.plot_scatter_matrix(columns=slice(0, 5))","title":"Example"},{"location":"API/plots/plot_successive_halving/","text":"plot_successive_halving method plot_successive_halving (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using successive halving . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.successive_halving([\"bag\", \"adab\", \"et\", \"lgb\"], metric=\"accuracy\", bagging=5) atom.plot_successive_halving(filename=\"plot_successive_halving\")","title":"plot_successive_halving"},{"location":"API/plots/plot_successive_halving/#plot_successive_halving","text":"method plot_successive_halving (models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using successive halving . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all the models in the pipeline are selected. metric: int or str, optional (default=0) Index or name of the metric to plot. Only for multi-metric runs. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_successive_halving"},{"location":"API/plots/plot_successive_halving/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.successive_halving([\"bag\", \"adab\", \"et\", \"lgb\"], metric=\"accuracy\", bagging=5) atom.plot_successive_halving(filename=\"plot_successive_halving\")","title":"Example"},{"location":"API/plots/plot_threshold/","text":"plot_threshold method plot_threshold (models=None, metric=None, dataset=\"test\", steps=100, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot metric performances against threshold values. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: str, callable, list, tuple or None, optional (default=None) Metric(s) to plot. These can be one of sklearn's predefined scorers, a metric function or a sklearn scorer object (see the user guide ). If None, the metric used to run the pipeline is used. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". steps: int, optional (default=100) Number of thresholds measured. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier from sklearn.metrics import recall_score atom = ATOMClassifier(X, y) atom.run(\"LGB\") atom.plot_threshold(metric=[\"accuracy\", \"f1\", recall_score])","title":"plot_threshold"},{"location":"API/plots/plot_threshold/#plot_threshold","text":"method plot_threshold (models=None, metric=None, dataset=\"test\", steps=100, title=None, figsize=(10, 6), filename=None, display=True) [source] Plot metric performances against threshold values. Only for binary classification tasks. Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. metric: str, callable, list, tuple or None, optional (default=None) Metric(s) to plot. These can be one of sklearn's predefined scorers, a metric function or a sklearn scorer object (see the user guide ). If None, the metric used to run the pipeline is used. dataset: str, optional (default=\"test\") Data set on which to calculate the metric. Options are \"train\", \"test\" or \"both\". steps: int, optional (default=100) Number of thresholds measured. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6)) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"plot_threshold"},{"location":"API/plots/plot_threshold/#example","text":"from atom import ATOMClassifier from sklearn.metrics import recall_score atom = ATOMClassifier(X, y) atom.run(\"LGB\") atom.plot_threshold(metric=[\"accuracy\", \"f1\", recall_score])","title":"Example"},{"location":"API/plots/scatter_plot/","text":"scatter_plot method scatter_plot (models=None, index=None, feature=0, target=1, title=None, figsize=(10, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and the SHAP value of the same feature on the y-axis. This shows how the model depends on the given feature, and is like a richer extension of the classical partial dependence plots. Vertical dispersion of the data points represents interaction effects. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The scatter plot does not support plotting a single sample. feature: int or str, optional (default=0) Index or name of the feature to plot. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's scatter plot . Example from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.scatter_plot(feature=\"bmi\")","title":"scatter_plot"},{"location":"API/plots/scatter_plot/#scatter_plot","text":"method scatter_plot (models=None, index=None, feature=0, target=1, title=None, figsize=(10, 6), filename=None, display=True, **kwargs) [source] Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and the SHAP value of the same feature on the y-axis. This shows how the model depends on the given feature, and is like a richer extension of the classical partial dependence plots. Vertical dispersion of the data points represents interaction effects. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: tuple, slice or None, optional (default=None) Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows n until m. If None, it selects all rows in the test set. The scatter plot does not support plotting a single sample. feature: int or str, optional (default=0) Index or name of the feature to plot. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple, optional (default=(10, 6))) Figure's size, format as (x, y). filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. **kwargs Additional keyword arguments for SHAP's scatter plot .","title":"scatter_plot"},{"location":"API/plots/scatter_plot/#example","text":"from atom import ATOMRegressor atom = ATOMRegressor(X, y) atom.run(\"RF\") atom.scatter_plot(feature=\"bmi\")","title":"Example"},{"location":"API/plots/waterfall_plot/","text":"waterfall_plot method waterfall_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True) [source] Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model\u2019s output. The waterfall plot is designed to visually display how the SHAP values (evidence) of each feature move the model output from our prior expectation under the background data distribution, to the final model prediction given the evidence of all the features. Features are sorted by the magnitude of their SHAP values with the smallest magnitude features grouped together at the bottom of the plot when the number of features in the models exceeds the show parameter. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int or None, optional (default=None) Index of the row in the dataset to plot. If None, it selects the first row in the test set. The waterfall plot does not support plotting multiple samples. show: int or None, optional (default=None) Number of features to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"Tree\") atom.tree.waterfall_plot(index=120)","title":"waterfall_plot"},{"location":"API/plots/waterfall_plot/#waterfall_plot","text":"method waterfall_plot (models=None, index=None, show=None, target=1, title=None, figsize=None, filename=None, display=True) [source] Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model\u2019s output. The waterfall plot is designed to visually display how the SHAP values (evidence) of each feature move the model output from our prior expectation under the background data distribution, to the final model prediction given the evidence of all the features. Features are sorted by the magnitude of their SHAP values with the smallest magnitude features grouped together at the bottom of the plot when the number of features in the models exceeds the show parameter. Read more about SHAP plots in the user guide . Parameters: models: str, sequence or None, optional (default=None) Name of the models to plot. If None, all models in the pipeline are selected. Note that selecting multiple models will raise an exception. To avoid this, call the plot from a model. index: int or None, optional (default=None) Index of the row in the dataset to plot. If None, it selects the first row in the test set. The waterfall plot does not support plotting multiple samples. show: int or None, optional (default=None) Number of features to show in the plot. None to show all. target: int or str, optional (default=1) Index or name of the class in the target column to look at. Only for multi-class classification tasks. title: str or None, optional (default=None) Plot's title. If None, the title is left empty. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, it adapts the size to the number of features shown. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot.","title":"waterfall_plot"},{"location":"API/plots/waterfall_plot/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"Tree\") atom.tree.waterfall_plot(index=120)","title":"Example"},{"location":"API/predicting/decision_function/","text":"decision_function method decision_function (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a decision_function method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted confidence scores of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"kSVM\", metric=\"accuracy\") # Predict confidence scores on new data predictions = atom.ksvm.decision_function(X_new)","title":"decision_function"},{"location":"API/predicting/decision_function/#decision_function","text":"method decision_function (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a decision_function method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted confidence scores of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"decision_function"},{"location":"API/predicting/decision_function/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run(\"kSVM\", metric=\"accuracy\") # Predict confidence scores on new data predictions = atom.ksvm.decision_function(X_new)","title":"Example"},{"location":"API/predicting/predict/","text":"predict method predict (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted targets with shape=(n_samples,). Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict(X_new)","title":"predict"},{"location":"API/predicting/predict/#predict","text":"method predict (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray Predicted targets with shape=(n_samples,).","title":"predict"},{"location":"API/predicting/predict/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict(X_new)","title":"Example"},{"location":"API/predicting/predict_log_proba/","text":"predict_log_proba method predict_log_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_log_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class log-probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_log_proba(X_new)","title":"predict_log_proba"},{"location":"API/predicting/predict_log_proba/#predict_log_proba","text":"method predict_log_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_log_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class log-probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"predict_log_proba"},{"location":"API/predicting/predict_log_proba/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_log_proba(X_new)","title":"Example"},{"location":"API/predicting/predict_proba/","text":"predict_proba method predict_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_proba(X_new)","title":"predict_proba"},{"location":"API/predicting/predict_proba/#predict_proba","text":"method predict_proba (X, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a predict_proba method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: p: np.ndarray The class probabilities of the input samples, with shape=(n_samples,) for binary classification tasks and (n_samples, n_classes) for multiclass classification tasks.","title":"predict_proba"},{"location":"API/predicting/predict_proba/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"Tree\", \"AdaB\"], metric=\"AP\", n_calls=10) # Make predictions on new data predictions = atom.adab.predict_proba(X_new)","title":"Example"},{"location":"API/predicting/score/","text":"score method score (X, y, sample_weights=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a score method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). sample_weights: sequence or None, optional (default=None) Sample weights corresponding to y. pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: score: np.float64 Mean accuracy or r2 (depending on the task) of self.predict(X) with respect to y. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"MNB\", \"KNN\", \"kSVM\"], metric=\"precision\") # Get the mean accuracy on new data predictions = atom.mnb.score(X_new, y_new)","title":"score"},{"location":"API/predicting/score/#score","text":"method score (X, y, sample_weights=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the winner attribute). If called from a model, it will use that model. The estimator must have a score method. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Feature set with shape=(n_samples, n_features). y: int, str or sequence If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). sample_weights: sequence or None, optional (default=None) Sample weights corresponding to y. pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: score: np.float64 Mean accuracy or r2 (depending on the task) of self.predict(X) with respect to y.","title":"score"},{"location":"API/predicting/score/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.run([\"MNB\", \"KNN\", \"kSVM\"], metric=\"precision\") # Get the mean accuracy on new data predictions = atom.mnb.score(X_new, y_new)","title":"Example"},{"location":"API/predicting/transform/","text":"transform method transform (X, y=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the pipeline parameter to customize this behaviour. This method can only be called from atom, not from the models. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Features to transform, with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored in the transformers. If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided. Example from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean() atom.impute(strat_num=\"knn\", strat_cat=\"drop\") atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2) # Transform new data through all data cleaning steps X_transformed = atom.transform(X_new)","title":"transform"},{"location":"API/predicting/transform/#transform","text":"method transform (X, y=None, pipeline=None, verbose=None) [source] Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the pipeline parameter to customize this behaviour. This method can only be called from atom, not from the models. Parameters: X: dict, list, tuple, np.ndarray or pd.DataFrame Features to transform, with shape=(n_samples, n_features). y: int, str, sequence or None, optional (default=None) If None: y is ignored in the transformers. If int: Position of the target column in X. If str: Name of the target column in X. Else: Target column with shape=(n_samples,). pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. If None: Only transformers that are applied on the whole dataset are used. If False: Don't use any transformers. If True: Use all transformers in the pipeline. If sequence: Transformers to use, selected by their index in the pipeline. verbose: int or None, optional (default=None) Verbosity level of the output. If None, it uses the transformer's own verbosity. Returns: X: pd.DataFrame Transformed feature set. y: pd.Series Transformed target column. Only returned if provided.","title":"transform"},{"location":"API/predicting/transform/#example","text":"from atom import ATOMClassifier atom = ATOMClassifier(X, y) atom.clean() atom.impute(strat_num=\"knn\", strat_cat=\"drop\") atom.prune(strategy=\"z-score\", method=\"min_max\", max_sigma=2) # Transform new data through all data cleaning steps X_transformed = atom.transform(X_new)","title":"Example"},{"location":"API/training/directclassifier/","text":"DirectClassifier class atom.training. DirectClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import DirectClassifier # Run the pipeline trainer = DirectClassifier([\"Tree\", \"RF\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.scoring(\"auc\") trainer.Tree.plot_bo()","title":"DirectClassifier"},{"location":"API/training/directclassifier/#directclassifier","text":"class atom.training. DirectClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"DirectClassifier"},{"location":"API/training/directclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/directclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/directclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/directclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/directclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/directclassifier/#example","text":"from atom.training import DirectClassifier # Run the pipeline trainer = DirectClassifier([\"Tree\", \"RF\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.scoring(\"auc\") trainer.Tree.plot_bo()","title":"Example"},{"location":"API/training/directregressor/","text":"DirectRegressor class atom.training. DirectRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import DirectRegressor # Run the pipeline trainer = DirectRegressor([\"OLS\", \"BR\"], n_calls=5, n_initial_points=3, bagging=5) trainer.run(train, test) # Analyze the results trainer.scoring(\"mse\") trainer.plot_results()","title":"DirectRegressor"},{"location":"API/training/directregressor/#directregressor","text":"class atom.training. DirectRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluates the models to the data in the pipeline. The following steps are applied: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the DirectRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"DirectRegressor"},{"location":"API/training/directregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/directregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/directregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results with the model acronyms as index. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/directregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/directregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/directregressor/#example","text":"from atom.training import DirectRegressor # Run the pipeline trainer = DirectRegressor([\"OLS\", \"BR\"], n_calls=5, n_initial_points=3, bagging=5) trainer.run(train, test) # Analyze the results trainer.scoring(\"mse\") trainer.plot_results()","title":"Example"},{"location":"API/training/successivehalvingclassifier/","text":"SuccessiveHalvingClassifier class atom.training. SuccessiveHalvingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators is used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import SuccessiveHalvingClassifier # Run the pipeline trainer = SuccessiveHalvingClassifier([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"SuccessiveHalvingClassifier"},{"location":"API/training/successivehalvingclassifier/#successivehalvingclassifier","text":"class atom.training. SuccessiveHalvingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"SuccessiveHalvingClassifier"},{"location":"API/training/successivehalvingclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/successivehalvingclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/successivehalvingclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/successivehalvingclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/successivehalvingclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators is used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/successivehalvingclassifier/#example","text":"from atom.training import SuccessiveHalvingClassifier # Run the pipeline trainer = SuccessiveHalvingClassifier([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"Example"},{"location":"API/training/successivehalvingregressor/","text":"SuccessiveHalvingRegressor class atom.training. SuccessiveHalvingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import SuccessiveHalvingRegressor # Run the pipeline trainer = SuccessiveHalvingRegressor([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"SuccessiveHalvingRegressor"},{"location":"API/training/successivehalvingregressor/#successivehalvingregressor","text":"class atom.training. SuccessiveHalvingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a successive halving fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the complete training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the SuccessiveHalvingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. skip_runs: int, optional (default=0) Skip last skip_runs runs of the successive halving. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"SuccessiveHalvingRegressor"},{"location":"API/training/successivehalvingregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/successivehalvingregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/successivehalvingregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/successivehalvingregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/successivehalvingregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/successivehalvingregressor/#example","text":"from atom.training import SuccessiveHalvingRegressor # Run the pipeline trainer = SuccessiveHalvingRegressor([\"Tree\", \"Bag\", \"RF\", \"ET\"], n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_successive_halving()","title":"Example"},{"location":"API/training/trainsizingclassifier/","text":"TrainSizingClassifier class atom.training. TrainSizingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import TrainSizingClassifier # Run the pipeline trainer = TrainSizingClassifier(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"TrainSizingClassifier"},{"location":"API/training/trainsizingclassifier/#trainsizingclassifier","text":"class atom.training. TrainSizingClassifier (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params=None, bo_params=None, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingClassifier instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"GNB\" for Gaussian Naive Bayes \"MNB\" for Multinomial Naive Bayes \"BNB\" for Bernoulli Naive Bayes \"CatNB\" for Categorical Naive Bayes \"CNB\" for Complement Naive Bayes \"Ridge\" for Ridge Classification \"LR\" for Logistic Regression \"LDA\" for Linear Discriminant Analysis \"QDA\" for Quadratic Discriminant Analysis \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict or None, optional (default=None) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict or None, optional (default=None) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"TrainSizingClassifier"},{"location":"API/training/trainsizingclassifier/#attributes","text":"","title":"Attributes"},{"location":"API/training/trainsizingclassifier/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/trainsizingclassifier/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/trainsizingclassifier/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/trainsizingclassifier/#methods","text":"calibrate Calibrate the winning model. canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_class_weight Return class weights for a balanced dataset. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method calibrate (**kwargs) [source] Applies probability calibration on the winning model. The calibration is performed using sklearn's CalibratedClassifierCV class. The model is trained via cross-validation on a subset of the training data, using the rest to fit the calibrator. The new classifier will replace the estimator attribute. After calibrating, all prediction attributes of the winning model will reset. Parameters: **kwargs Additional keyword arguments for the CalibratedClassifierCV instance. Using cv=\"prefit\" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_class_weight (dataset=\"train\") [source] Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely proportional to class frequencies in the selected data set. Parameters: dataset: str, optional (default=\"train\") Data set from which to get the weights. Choose between \"train\", \"test\" or \"dataset\". Returns: class_weights: dict Classes with the corresponding weights. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's classification SCORERS or one of the following custom metrics: \"cm\" for the confusion matrix. \"tn\" for true negatives. \"fp\" for false positives. \"fn\" for false negatives. \"tp\" for true positives. \"lift\" for the lift metric. \"fpr\" for the false positive rate. \"tpr\" for true positive rate. \"sup\" for the support metric. If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Logistic Regression is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/trainsizingclassifier/#example","text":"from atom.training import TrainSizingClassifier # Run the pipeline trainer = TrainSizingClassifier(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"Example"},{"location":"API/training/trainsizingregressor/","text":"TrainSizingRegressor class atom.training. TrainSizingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random . Attributes Data attributes The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column. Utility attributes Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run. Plot attributes Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes. Methods canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None. Example from atom.training import TrainSizingRegressor # Run the pipeline trainer = TrainSizingRegressor(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"TrainSizingRegressor"},{"location":"API/training/trainsizingregressor/#trainsizingregressor","text":"class atom.training. TrainSizingRegressor (models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0, n_initial_points=5, est_params={}, bo_params={}, bagging=0, n_jobs=1, verbose=0, logger=None, random_state=None) [source] Fit and evaluate the models in a train sizing fashion. The pipeline applies the following steps per iteration: The optimal hyperparameters are selected using a bayesian optimization algorithm. The model is fitted on the training set using the best combinations of hyperparameters found. Using a bagging algorithm, various scores on the test set are calculated. Just like atom, you can predict , plot and call any model from the TrainSizingRegressor instance. Read more in the user guide . Parameters: models: str or sequence Models to fit to the data. Use a custom estimator or the model's predefined acronyms. Possible values are (case-insensitive): \"GP\" for Gaussian Process \"OLS\" for Ordinary Least Squares \"Ridge\" for Ridge Regression \"Lasso\" for Lasso Regression \"EN\" for ElasticNet \"BR\" for Bayesian Ridge \"ARD\" for Automated Relevance Determination \"KNN\" for K-Nearest Neighbors \"RNN\" for Radius Nearest Neighbors \"Tree\" for a single Decision Tree \"Bag\" for Bagging \"ET\" for Extra-Trees \"RF\" for Random Forest \"AdaB\" for AdaBoost \"GBM\" for Gradient Boosting Machine \"XGB\" for XGBoost (only available if package is installed) \"LGB\" for LightGBM (only available if package is installed) \"CatB\" for CatBoost (only available if package is installed) \"lSVM\" for Linear-SVM \"kSVM\" for Kernel-SVM \"PA\" for Passive Aggressive \"SGD\" for Stochastic Gradient Descent \"MLP\" for Multi-layer Perceptron metric: str, callable or sequence, optional (default=None) Metric on which to fit the models. Choose from any of sklearn's predefined SCORERS , a score (or loss) function with signature metric(y, y_pred, **kwargs), a scorer object or a sequence of these. If multiple metrics are selected, only the first is used to optimize the BO. If None, a default metric is selected: \"f1\" for binary classification \"f1_weighted\" for multiclass classification \"r2\" for regression greater_is_better: bool or sequence, optional (default=True) Whether the metric is a score function or a loss function, i.e. if True, a higher score is better and if False, lower is better. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_proba: bool or sequence, optional (default=False) Whether the metric function requires probability estimates out of a classifier. If True, make sure that every selected model has a predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. needs_threshold: bool or sequence, optional (default=False) Whether the metric function takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. Will be ignored if the metric is a string or a scorer. If sequence, the n-th value will apply to the n-th metric. train_sizes: sequence, optional (default=np.linspace(0.2, 1.0, 5)) Sequence of training set sizes used to run the trainings. If < =1: Fraction of the training set. If >1: Total number of samples. n_calls: int or sequence, optional (default=0) Maximum number of iterations of the BO. It includes the random points of n_initial_points . If 0, skip the BO and fit the model on its default parameters. If sequence, the n-th value will apply to the n-th model. n_initial_points: int or sequence, optional (default=5) Initial number of random tests of the BO before fitting the surrogate function. If equal to n_calls , the optimizer will technically be performing a random search. If sequence, the n-th value will apply to the n-th model. est_params: dict, optional (default={}) Additional parameters for the estimators. See the corresponding documentation for the available options. For multiple models, use the acronyms as key and a dictionary of the parameters as value. Add _fit to the parameter's name to pass it to the fit method instead of the initializer. bo_params: dict, optional (default={}) Additional parameters to for the BO. These can include: base_estimator: str, optional (default=\"GP\") Base estimator to use in the BO. Choose from: \"GP\" for Gaussian Process \"RF\" for Random Forest \"ET\" for Extra-Trees \"GBRT\" for Gradient Boosted Regression Trees max_time: int, optional (default=np.inf) Stop the optimization after max_time seconds. delta_x: int or float, optional (default=0) Stop the optimization when |x1 - x2| < delta_x . delta_y: int or float, optional (default=0) Stop the optimization if the 5 minima are within delta_y (skopt always minimizes the function). cv: int, optional (default=5) Number of folds for the cross-validation. If 1, the training set is randomly split in a subtrain and validation set. early stopping: int, float or None, optional (default=None) Training will stop if the model didn't improve in last early_stopping rounds. If < 1, fraction of rounds from the total. If None, no early stopping is performed. Only available for models that allow in-training evaluation. callback: callable or list of callables, optional (default=None) Callbacks for the BO. dimensions: dict, array or None, optional (default=None) Custom hyperparameter space for the bayesian optimization. Can be an array to share dimensions across models or a dictionary with the model's name as key. If None, ATOM's predefined dimensions are used. plot: bool, optional (default=False) Whether to plot the BO's progress as it runs. Creates a canvas with two plots: the first plot shows the score of every trial and the second shows the distance between the last consecutive steps. Additional keyword arguments for skopt's optimizer. bagging: int or sequence, optional (default=0) Number of data sets (bootstrapped from the training set) to use in the bagging algorithm. If 0, no bagging is performed. If sequence, the n-th value will apply to the n-th model. n_jobs: int, optional (default=1) Number of cores to use for parallel processing. If >0: Number of cores to use. If -1: Use all available cores. If < -1: Use available_cores - 1 + n_jobs. Beware that using multiple processes on the same machine may cause memory issues for large datasets. verbose: int, optional (default=0) Verbosity level of the class. Possible values are: 0 to not print anything. 1 to print basic information. 2 to print detailed information. logger: str, Logger or None, optional (default=None) If None: Doesn't save a logging file. If str: Name of the logging file. Use \"auto\" for default name. Else: Python logging.Logger instance. The default name consists of the class' name followed by the timestamp of the logger's creation. random_state: int or None, optional (default=None) Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .","title":"TrainSizingRegressor"},{"location":"API/training/trainsizingregressor/#attributes","text":"","title":"Attributes"},{"location":"API/training/trainsizingregressor/#data-attributes","text":"The dataset can be accessed at any time through multiple attributes, e.g. calling trainer.train will return the training set. The data can also be changed through these attributes, e.g. trainer.test = atom.test.drop(0) will drop the first row from the test set. Updating one of the data attributes will automatically update the rest as well. Attributes: dataset: pd.DataFrame Complete dataset in the pipeline. train: pd.DataFrame Training set. test: pd.DataFrame Test set. X: pd.DataFrame Feature set. y: pd.Series Target column. X_train: pd.DataFrame Training features. y_train: pd.Series Training target. X_test: pd.DataFrame Test features. y_test: pd.Series Test target. shape: tuple Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for deep learning datasets. columns: list Names of the columns in the dataset. n_columns: int Number of columns in the dataset. features: list Names of the features in the dataset. n_features: int Number of features in the dataset. target: str Name of the target column.","title":"Data attributes"},{"location":"API/training/trainsizingregressor/#utility-attributes","text":"Attributes: models: list List of models in the pipeline. metric: str or list Metric(s) used to fit the models. errors: dict Dictionary of the encountered exceptions during fitting (if any). winner: model Model subclass that performed best on the test set. results: pd.DataFrame Dataframe of the training results. Columns can include: metric_bo: Best score achieved during the BO. time_bo: Time spent on the BO. metric_train: Metric score on the training set. metric_test: Metric score on the test set. time_fit: Time spent fitting and evaluating. mean_bagging: Mean score of the bagging's results. std_bagging: Standard deviation score of the bagging's results. time_bagging: Time spent on the bagging algorithm. time: Total time spent on the whole run.","title":"Utility attributes"},{"location":"API/training/trainsizingregressor/#plot-attributes","text":"Attributes: style: str Plotting style. See seaborn's documentation . palette: str Color palette. See seaborn's documentation . title_fontsize: int Fontsize for the plot's title. label_fontsize: int Fontsize for labels and legends. tick_fontsize: int Fontsize for the ticks along the plot's axes.","title":"Plot attributes"},{"location":"API/training/trainsizingregressor/#methods","text":"canvas Create a figure with multiple plots. delete Remove a model from the pipeline. get_params Get parameters for this estimator. log Save information to the logger and print to stdout. reset_aesthetics Reset the plot aesthetics to their default values. reset_predictions Clear the prediction attributes from all models. run Fit and evaluate the models. save Save the instance to a pickle file. scoring Returns the scores of the models for a specific metric. set_params Set the parameters of this estimator. stacking Add a Stacking instance to the models in the pipeline. voting Add a Voting instance to the models in the pipeline. method canvas (nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True) [source] This @contextmanager allows you to draw many plots in one figure. The default option is to add two plots side by side. See the user guide for an example use case. Parameters: nrows: int, optional (default=1) Number of plots in length. ncols: int, optional (default=2) Number of plots in width. title: str or None, optional (default=None) Plot's title. If None, no title is displayed. figsize: tuple or None, optional (default=None) Figure's size, format as (x, y). If None, adapts size to the number of plots in the canvas. filename: str or None, optional (default=None) Name of the file. If None, the figure is not saved. display: bool, optional (default=True) Whether to render the plot. method delete (models=None) [source] Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. Parameters: models: str or sequence, optional (default=None) Name of the models to clear from the pipeline. If None, clear all models. method get_params (deep=True) [source] Get parameters for this estimator. Parameters: deep: bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params: dict Dictionary of the parameter names mapped to their values. method log (msg, level=0) [source] Write a message to the logger and print it to stdout. Parameters: msg: str Message to write to the logger and print to stdout. level: int, optional (default=0) Minimum verbosity level to print the message. method reset_aesthetics () [source] Reset the plot aesthetics to their default values. method reset_predictions () [source] Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer. method run (*arrays) [source] Fit and evaluate the models. Parameters: *arrays: sequence of indexables Training set and test set. Allowed input formats are: train, test X_train, X_test, y_train, y_test (X_train, y_train), (X_test, y_test) method save (filename=None, save_data=True) [source] Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use save_data=False . Parameters: filename: str or None, optional (default=None) Name to save the file with. None or \"auto\" to save with the __name__ of the class. save_data: bool, optional (default=True) Whether to save the data as an attribute of the instance. If False, remember to add the data to ATOMLoader when loading the file. method scoring (metric=None, dataset=\"test\", **kwargs) [source] Print all the models' scoring for a specific metric. Parameters: metric: str or None, optional (default=None) Name of the metric to calculate. Choose from any of sklearn's regression SCORERS . If None, returns the models' final results (ignores the dataset parameter). dataset: str, optional (default=\"test\") Additional keyword arguments for the metric function. method set_params (**params) [source] Set the parameters of this estimator. Parameters: **params: dict Estimator parameters. Returns: self: DirectClassifier Estimator instance. method stacking (models=None, estimator=None, stack_method=\"auto\", passthrough=False) [source] Add a Stacking instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the stacking. If None, it selects all models depending on the current branch. estimator: str, callable or None, optional (default=None) The final estimator, which is used to combine the base estimators. If str, choose from ATOM's predefined models . If None, Ridge is selected. stack_method: str, optional (default=\"auto\") Methods called for each base estimator. If \"auto\", it will try to invoke predict_proba , decision_function or predict in that order. passthrough: bool, optional (default=False) When False, only the predictions of estimators are used as training data for the final estimator. When True, the estimator is trained on the predictions as well as the original training data. method voting (models=None, weights=None) [source] Add a Voting instance to the models in the pipeline. Parameters: models: sequence or None, optional (default=None) Models that feed the voting. If None, it selects all models depending on the current branch. weights: sequence or None, optional (default=None) Sequence of weights (int or float) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.","title":"Methods"},{"location":"API/training/trainsizingregressor/#example","text":"from atom.training import TrainSizingRegressor # Run the pipeline trainer = TrainSizingRegressor(\"RF\", n_calls=5, n_initial_points=3) trainer.run(train, test) # Analyze the results trainer.plot_learning_curve()","title":"Example"}]} \ No newline at end of file diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 108e044f1..9f6433e06 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -1,371 +1,371 @@ http://tvdboom.github.io/ATOM/getting_started/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/user_guide/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/ATOM/atomclassifier/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/ATOM/atomregressor/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/ATOM/atommodel/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/scaler/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/cleaner/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/imputer/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/encoder/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/pruner/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/data_cleaning/balancer/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/feature_engineering/feature_generator/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/feature_engineering/feature_selector/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/directclassifier/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/directregressor/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/successivehalvingclassifier/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/successivehalvingregressor/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/trainsizingclassifier/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/training/trainsizingregressor/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/gp/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/gnb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/mnb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/bnb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/catnb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/cnb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/ols/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/ridge/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/lasso/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/en/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/br/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/ard/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/lr/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/lda/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/qda/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/knn/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/rnn/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/tree/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/bag/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/et/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/rf/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/adab/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/gbm/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/xgb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/lgb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/catb/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/lsvm/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/ksvm/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/pa/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/sgd/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/models/mlp/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/transform/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/predict/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/predict_proba/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/predict_log_proba/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/decision_function/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/predicting/score/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_correlation/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_scatter_matrix/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_distribution/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_qq/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_pipeline/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_pca/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_components/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_rfecv/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_successive_halving/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_learning_curve/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_results/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_bo/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_evals/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_roc/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_prc/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_permutation_importance/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_feature_importance/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_partial_dependence/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_errors/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_residuals/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_confusion_matrix/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_threshold/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_probabilities/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_calibration/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_gains/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/plot_lift/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/bar_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/beeswarm_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/decision_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/force_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/heatmap_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/scatter_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/API/plots/waterfall_plot/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/faq/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/dependencies/ - 2021-03-26 + 2021-03-29 daily http://tvdboom.github.io/ATOM/license/ - 2021-03-26 + 2021-03-29 daily \ No newline at end of file diff --git a/docs/sitemap.xml.gz b/docs/sitemap.xml.gz index 32a7e93da..8c431bafb 100644 Binary files a/docs/sitemap.xml.gz and b/docs/sitemap.xml.gz differ diff --git a/docs_sources/api/ATOM/ATOMLoader.md b/docs_sources/api/ATOM/ATOMLoader.md index 9140d1181..75e3cc090 100644 --- a/docs_sources/api/ATOM/ATOMLoader.md +++ b/docs_sources/api/ATOM/ATOMLoader.md @@ -3,7 +3,7 @@
function ATOMLoader(filename, data=None, transform_data=True, verbose=None)
-
+ Load a class instance from a pickle file. If the file is a trainer that was saved using `save_data=False`, you can load new data into it. For atom pickles, you can also apply all data transformations in the diff --git a/docs_sources/api/ATOM/atomclassifier.md b/docs_sources/api/ATOM/atomclassifier.md index d48b02ae0..534db6979 100644 --- a/docs_sources/api/ATOM/atomclassifier.md +++ b/docs_sources/api/ATOM/atomclassifier.md @@ -480,7 +480,7 @@ inspect the pipeline.
method add(transformer, columns=None, train_only=False)
-
+ Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's @@ -524,7 +524,7 @@ on the complete dataset.
method apply(func, column)
-
+ Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the @@ -555,7 +555,7 @@ Name or index of the column in the dataset to create or transform.
method automl(**kwargs)
-
+ Uses the [TPOT](http://epistasislab.github.io/tpot/) package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are @@ -576,7 +576,7 @@ Keyword arguments for tpot's classifier.
method calibrate(**kwargs)
-
+ Applies probability calibration on the winning model. The calibration is performed using sklearn's [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) class. The model is trained via cross-validation on a subset of the training data, @@ -600,7 +600,7 @@ this only if you have another, independent set for testing.
method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+ This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -640,7 +640,7 @@ Whether to render the plot.
method delete(models=None)
-
+ Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -659,7 +659,7 @@ Name of the models to clear from the pipeline. If None, clear all models.
method distribution(column=0)
-
+ Compute the [KS-statistic](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) for various distributions against a column in the dataset. Missing values are ignored. @@ -691,7 +691,7 @@ Dataframe with the statistic results.
method drop(columns)
-
+ Drop columns from the dataset. !!! note @@ -714,7 +714,7 @@ Names or indices of the columns to drop.
method export_pipeline(model=None)
-
+ Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) @@ -743,7 +743,7 @@ Pipeline in the current branch as a sklearn object.
method get_class_weight(dataset="train")
-
+ Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -771,7 +771,7 @@ Classes with the corresponding weights.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -792,7 +792,7 @@ Minimum verbosity level to print the message.
method report(dataset="dataset", n_rows=None, filename=None)
-
+ Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for `n_rows` > 10k.
@@ -833,7 +833,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
method reset_predictions()
-
+ Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


@@ -841,7 +841,7 @@ Use this method to free some memory before saving the trainer.
method save(filename=None, save_data=True)
-
+ Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -866,7 +866,7 @@ Whether to save the data as an attribute of the instance. If False, remember to
method save_data(filename=None, dataset="dataset")
-
+ Save the data in the current branch to a csv file.
@@ -887,7 +887,7 @@ Data set to save.
method scoring(metric=None, dataset="test", **kwargs)
-
+ Print all the models' scoring for a specific metric.
@@ -920,7 +920,7 @@ Additional keyword arguments for the metric function.
method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+ Add a Stacking instance to the models in the pipeline.
@@ -959,14 +959,14 @@ not already.
method stats()
-
+ Print basic information about the dataset.


method status()
-
+ Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's \__repr__ but will also save it to the logger. @@ -975,7 +975,7 @@ save it to the logger.
method voting(models=None, weights=None)
-
+ Add a Voting instance to the models in the pipeline.
@@ -1048,7 +1048,7 @@ on one of them will automatically apply the method on the dataset in the pipelin
method scale(strategy="standard")
-
+ Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the [Scaler](../data_cleaning/scaler.md) class.


@@ -1056,8 +1056,8 @@ of raising an exception). See the [Scaler](../data_cleaning/scaler.md) class.
method clean(prohibited_types=None, strip_categorical=True, maximum_cardinality=True,
-             minimum_cardinality=True, missing_target=True, encode_target=None) 
-
+ minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) + Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: @@ -1075,7 +1075,7 @@ See [Cleaner](../data_cleaning/cleaner.md) for a description of the parameters.
method impute(strat_num="drop", strat_cat="drop", min_frac_rows=None, min_frac_cols=None, missing=None) 
-
+ Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the `missing` attribute to customize what @@ -1087,7 +1087,7 @@ the train and test set, the size of the sets may change after the tranformation.
method encode(strategy="LeaveOneOut", max_onehot=10, frac_to_other=None)
-
+ Perform encoding of categorical features. The encoding type depends on the number of unique values in the column:
    @@ -1106,7 +1106,7 @@ for a description of the parameters.
    method prune(strategy="z-score", method="drop", max_sigma=3, include_target=False, **kwargs)
    -
    + Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned @@ -1118,7 +1118,7 @@ for a description of the parameters.
    method balance(strategy="ADASYN", **kwargs)
    -
    + Balance the number of samples per target class in the target column. Only the training set is balanced in order to maintain the original distribution of target classes in the test set. See [Balancer](../data_cleaning/balancer.md) for a description of @@ -1151,7 +1151,7 @@ To further pre-process the data, you can create new non-linear features transfor
    method feature_generation(strategy="DFS", n_features=None, generations=20, population=500, operators=None)
    -
    + Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See [FeatureGenerator](../feature_engineering/feature_generator.md) for @@ -1163,7 +1163,7 @@ atom.
    method feature_selection(strategy=None, solver=None, n_features=None,
                              max_frac_repeated=1., max_correlation=1., **kwargs) 
    -
    + Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson @@ -1224,7 +1224,7 @@ as the [models](../../../user_guide/#models), and the
    method run(models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False,
                n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [DirectClassifier](../training/directclassifier.md) instance.


    @@ -1233,7 +1233,7 @@ Runs a [DirectClassifier](../training/directclassifier.md) instance.
    method successive_halving(models, metric=None, greater_is_better=True, needs_proba=False,
                               needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5,
                               est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [SuccessiveHalvingClassifier](../training/successivehalvingclassifier.md) instance.


    @@ -1242,7 +1242,7 @@ Runs a [SuccessiveHalvingClassifier](../training/successivehalvingclassifier.md)
    method train_sizing(models, metric=None, greater_is_better=True, needs_proba=False,
                         needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0,
                         n_initial_points=5, est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [TrainSizingClassifier](../training/trainsizingclassifier.md) instance.


    diff --git a/docs_sources/api/ATOM/atommodel.md b/docs_sources/api/ATOM/atommodel.md index 36141c5bc..5385a2f09 100644 --- a/docs_sources/api/ATOM/atommodel.md +++ b/docs_sources/api/ATOM/atommodel.md @@ -2,8 +2,8 @@ -----------
    function ATOMModel(estimator, acronym=None, fullname=None, needs_scaling=False)
    -
    -Convert an estimator to a model that can be ingested by ATOM. + +Convert an estimator to a model that can be ingested by atom.
diff --git a/docs_sources/api/ATOM/atomregressor.md b/docs_sources/api/ATOM/atomregressor.md index 84db5d65b..a91d97888 100644 --- a/docs_sources/api/ATOM/atomregressor.md +++ b/docs_sources/api/ATOM/atomregressor.md @@ -3,7 +3,7 @@
class atom.api.ATOMRegressor(*arrays, y=-1, n_rows=1, test_size=0.2, logger=None,
                              n_jobs=1, warnings=True, verbose=0, random_state=None)
-
+ ATOMRegressor is ATOM's wrapper for regression tasks. Use this class to easily apply all data transformations and model management provided by the package on a given dataset. Note that contrary to sklearn's API, an ATOMRegressor instance already @@ -458,7 +458,7 @@ inspect the pipeline.
method add(transformer, columns=None, train_only=False)
-
+ Add a transformer to the current branch. If the transformer is not fitted, it is fitted on the complete training set. Afterwards, the data set is transformed and the transformer is added to atom's @@ -502,7 +502,7 @@ on the complete dataset.
method apply(func, column)
-
+ Transform one column in the dataset using a function (can be a lambda). If the provided column is present in the dataset, that same column is transformed. If it's not a column in the @@ -533,7 +533,7 @@ Name or index of the column in the dataset to create or transform.
method automl(**kwargs)
-
+ Uses the [TPOT](http://epistasislab.github.io/tpot/) package to perform an automated search of transformers and a final estimator that maximizes a metric on the dataset. The resulting transformations and estimator are @@ -554,7 +554,7 @@ Keyword arguments for tpot's regressor.
method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
-
+ This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -594,7 +594,7 @@ Whether to render the plot.
method delete(models=None)
-
+ Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -613,7 +613,7 @@ Name of the models to clear from the pipeline. If None, clear all models.
method distribution(column=0)
-
+ Compute the [KS-statistic](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) for various distributions against a column in the dataset. Missing values are ignored. @@ -645,7 +645,7 @@ Dataframe with the statistic results.
method drop(columns)
-
+ Drop columns from the dataset. !!! note @@ -668,7 +668,7 @@ Names or indices of the columns to drop.
method export_pipeline(model=None)
-
+ Export atom's pipeline to a sklearn's Pipeline. Optionally, you can add a model as final estimator. If the model needs feature scaling and there is no scaler in the pipeline, a [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) @@ -697,7 +697,7 @@ Pipeline in the current branch as a sklearn object.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
Parameters:
@@ -718,7 +718,7 @@ Minimum verbosity level to print the message.
method report(dataset="dataset", n_rows=None, filename=None)
-
+ Create an extensive profile analysis report of the data. The report is rendered in HTML5 and CSS3. Note that this method can be slow for `n_rows` > 10k.
@@ -759,7 +759,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
method reset_predictions()
-
+ Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


@@ -767,7 +767,7 @@ Use this method to free some memory before saving the trainer.
method save(filename=None, save_data=True)
-
+ Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -792,7 +792,7 @@ Whether to save the data as an attribute of the instance. If False, remember to
method save_data(filename=None, dataset="dataset")
-
+ Save the data in the current branch to a csv file.
@@ -813,7 +813,7 @@ Data set to save.
method scoring(metric=None, dataset="test", **kwargs)
-
+ Print all the models' scoring for a specific metric.
@@ -834,7 +834,7 @@ Additional keyword arguments for the metric function.
method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
-
+ Add a Stacking instance to the models in the pipeline.
@@ -873,14 +873,14 @@ not already.
method stats()
-
+ Print basic information about the dataset.


method status()
-
+ Get an overview of the branches, models and errors in the current instance. This method prints the same information as atom's \__repr__ but will also save it to the logger. @@ -889,7 +889,7 @@ save it to the logger.
method voting(models=None, weights=None)
-
+ Add a Voting instance to the models in the pipeline.
@@ -957,7 +957,7 @@ automatically apply the method on the dataset in the pipeline.
method scale(strategy="standard")
-
+ Applies one of sklearn's scalers. Non-numerical columns are ignored (instead of raising an exception). See the [Scaler](../data_cleaning/scaler.md) class.


@@ -965,8 +965,8 @@ of raising an exception). See the [Scaler](../data_cleaning/scaler.md) class.
method clean(prohibited_types=None, strip_categorical=True, maximum_cardinality=True,
-             minimum_cardinality=True, missing_target=True, encode_target=None) 
-
+ minimum_cardinality=True, drop_duplicates=False, missing_target=True, encode_target=None) + Applies standard data cleaning steps on the dataset. Use the parameters to choose which transformations to perform. The available steps are: @@ -984,7 +984,7 @@ See [Cleaner](../data_cleaning/cleaner.md) for a description of the parameters.
method impute(strat_num="drop", strat_cat="drop", min_frac_rows=None, min_frac_cols=None, missing=None) 
-
+ Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. The imputer is fitted only on the training set to avoid data leakage. Use the `missing` attribute to customize what @@ -996,7 +996,7 @@ the train and test set, the size of the sets may change after the tranformation.
method encode(strategy="LeaveOneOut", max_onehot=10, frac_to_other=None)
-
+ Perform encoding of categorical features. The encoding type depends on the number of unique values in the column:
    @@ -1015,7 +1015,7 @@ for a description of the parameters.
    method prune(strategy="z-score", method="drop", max_sigma=3, include_target=False, **kwargs)
    -
    + Prune outliers from the training set. The definition of outlier depends on the selected strategy and can greatly differ from one each other. Ignores categorical columns. Only outliers from the training set are pruned @@ -1050,7 +1050,7 @@ of the provided strategies.
    method feature_generation(strategy="DFS", n_features=None, generations=20, population=500, operators=None)
    -
    + Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. See [FeatureGenerator](../feature_engineering/feature_generator.md) for @@ -1062,7 +1062,7 @@ atom.
    method feature_selection(strategy=None, solver=None, n_features=None,
                              max_frac_repeated=1., max_correlation=1., **kwargs) 
    -
    + Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Also removes features with too low variance and finds pairs of collinear features based on the Pearson @@ -1123,7 +1123,7 @@ as the [models](../../../user_guide/#models), and the
    method run(models, metric=None, greater_is_better=True, needs_proba=False, needs_threshold=False,
                n_calls=10, n_initial_points=5, est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [DirectRegressor](../training/directregressor.md) instance.


    @@ -1132,7 +1132,7 @@ Runs a [DirectRegressor](../training/directregressor.md) instance.
    method successive_halving(models, metric=None, greater_is_better=True, needs_proba=False,
                               needs_threshold=False, skip_runs=0, n_calls=0, n_initial_points=5,
                               est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [SuccessiveHalvingRegressor](../training/successivehalvingregressor.md) instance.


    @@ -1141,7 +1141,7 @@ Runs a [SuccessiveHalvingRegressor](../training/successivehalvingregressor.md) i
    method train_sizing(models, metric=None, greater_is_better=True, needs_proba=False,
                         needs_threshold=False, train_sizes=np.linspace(0.2, 1.0, 5), n_calls=0,
                         n_initial_points=5, est_params=None, bo_params=None, bagging=0) 
    -
    + Runs a [TrainSizingRegressor](../training/trainsizingregressor.md) instance.


    diff --git a/docs_sources/api/data_cleaning/balancer.md b/docs_sources/api/data_cleaning/balancer.md index 54bd95266..411bd302c 100644 --- a/docs_sources/api/data_cleaning/balancer.md +++ b/docs_sources/api/data_cleaning/balancer.md @@ -3,7 +3,7 @@
    class atom.data_cleaning.Balancer(strategy="ADASYN", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs)
    -
    + Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the [balance](../../ATOM/atomclassifier/#balance) method. Read more in @@ -150,7 +150,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
@@ -171,7 +171,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -213,7 +213,7 @@ Estimator instance.
method transform(X, y) 
-
+ Oversample or undersample the data.
diff --git a/docs_sources/api/data_cleaning/cleaner.md b/docs_sources/api/data_cleaning/cleaner.md index a39c664a1..d6973a908 100644 --- a/docs_sources/api/data_cleaning/cleaner.md +++ b/docs_sources/api/data_cleaning/cleaner.md @@ -4,7 +4,7 @@
class atom.data_cleaning.Cleaner(prohibited_types=None, maximum_cardinality=True, minimum_cardinality=True,
                                  strip_categorical=True, drop_duplicates=False, missing_target=True,
                                  encode_target=True, verbose=0, logger=None)
-
+ Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are: @@ -210,7 +210,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -231,7 +231,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -276,7 +276,7 @@ Estimator instance.
method transform(X, y=None) 
-
+ Apply the data cleaning steps on the data.
diff --git a/docs_sources/api/data_cleaning/encoder.md b/docs_sources/api/data_cleaning/encoder.md index 92cf72cc9..dcafb9975 100644 --- a/docs_sources/api/data_cleaning/encoder.md +++ b/docs_sources/api/data_cleaning/encoder.md @@ -3,7 +3,7 @@
class atom.data_cleaning.Encoder(strategy="LeaveOneOut", max_onehot=10,
                                  frac_to_other=None, verbose=0, logger=None, **kwargs)
-
+ Perform encoding of categorical features. The encoding type depends on the number of classes in the column: @@ -120,7 +120,7 @@ Additional keyword arguments passed to the strategy estimator.
method fit(X, y) 
-
+ Fit to data.
@@ -210,7 +210,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -231,7 +231,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -273,7 +273,7 @@ Estimator instance.
method transform(X, y=None) 
-
+ Encode the data.
diff --git a/docs_sources/api/data_cleaning/imputer.md b/docs_sources/api/data_cleaning/imputer.md index da431b412..d351c5c39 100644 --- a/docs_sources/api/data_cleaning/imputer.md +++ b/docs_sources/api/data_cleaning/imputer.md @@ -3,7 +3,7 @@
class atom.data_cleaning.Imputer(strat_num="drop", strat_cat="drop", min_frac_rows=None,
                                  min_frac_cols=None, verbose=0, logger=None)
-
+ Impute or remove missing values according to the selected strategy. Also removes rows and columns with too many missing values. Use the `missing` attribute to customize what are considered "missing @@ -143,7 +143,7 @@ considered missing since they are incompatible with sklearn estimators.
method fit(X, y=None) 
-
+ Fit to data.
@@ -237,7 +237,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -258,7 +258,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -300,7 +300,7 @@ Estimator instance.
method transform(X, y=None) 
-
+ Impute the data. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation.
diff --git a/docs_sources/api/data_cleaning/pruner.md b/docs_sources/api/data_cleaning/pruner.md index 237f9923a..7c36c4f06 100644 --- a/docs_sources/api/data_cleaning/pruner.md +++ b/docs_sources/api/data_cleaning/pruner.md @@ -3,7 +3,7 @@
class atom.data_cleaning.Pruner(strategy="z-score", method="drop", max_sigma=3,
                                 include_target=False, verbose=0, logger=None, **kwargs)
-
+ Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed @@ -159,7 +159,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -180,7 +180,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -222,7 +222,7 @@ Estimator instance.
method transform(X, y=None) 
-
+ Apply the outlier strategy on the data.
diff --git a/docs_sources/api/data_cleaning/scaler.md b/docs_sources/api/data_cleaning/scaler.md index 2104b2093..0be6dbf5d 100644 --- a/docs_sources/api/data_cleaning/scaler.md +++ b/docs_sources/api/data_cleaning/scaler.md @@ -2,7 +2,7 @@ --------
class atom.data_cleaning.Scaler(strategy="standard", verbose=0, logger=None)
-
+ This class applies one of sklearn's scalers. It also returns a dataframe when provided, and it ignores non-numerical columns (instead of raising an exception). This class can be accessed from atom through the @@ -114,7 +114,7 @@ Estimator's instance with which the data is scaled.
method fit(X, y=None) 
-
+ Compute the mean and std to be used for later scaling.
@@ -197,7 +197,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -218,7 +218,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
@@ -260,7 +260,7 @@ Estimator instance.
method transform(X, y=None) 
-
+ Perform standardization by centering and scaling.
diff --git a/docs_sources/api/feature_engineering/feature_generator.md b/docs_sources/api/feature_engineering/feature_generator.md index e487c1181..1ab58166f 100644 --- a/docs_sources/api/feature_engineering/feature_generator.md +++ b/docs_sources/api/feature_engineering/feature_generator.md @@ -263,7 +263,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -284,7 +284,7 @@ Minimum verbosity level to print the message.
method save(filename=None)
-
+ Save the instance to a pickle file.
diff --git a/docs_sources/api/feature_engineering/feature_selector.md b/docs_sources/api/feature_engineering/feature_selector.md index 687af11f9..096538921 100644 --- a/docs_sources/api/feature_engineering/feature_selector.md +++ b/docs_sources/api/feature_engineering/feature_selector.md @@ -381,7 +381,7 @@ Dictionary of the parameter names mapped to their values.
method log(msg, level=0)
-
+ Write a message to the logger and print it to stdout.
@@ -402,7 +402,7 @@ Minimum verbosity level to print the message.
method plot_pca(title=None, figsize=(10, 6), filename=None, display=True)
-
+ Plot the explained variance ratio vs the number of components. See [plot_pca](../../plots/plot_pca) for a description of the parameters.


@@ -410,7 +410,7 @@ See [plot_pca](../../plots/plot_pca) for a description of the parameters.
method plot_components(show=None, title=None, figsize=None, filename=None, display=True)
-
+ Plot the explained variance ratio per components. See [plot_components](../../plots/plot_components) for a description of the parameters.


@@ -418,7 +418,7 @@ See [plot_components](../../plots/plot_components) for a description of the para
method plot_rfecv(title=None, figsize=(10, 6), filename=None, display=True)
-
+ Plot the scores obtained by the estimator fitted on every subset of the data. See [plot_rfecv](../../plots/plot_rfecv) for a description of the parameters.


@@ -433,7 +433,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
method save(filename=None)
-
+ Save the instance to a pickle file.
diff --git a/docs_sources/api/models/adab.md b/docs_sources/api/models/adab.md index 8da387340..a2f62940e 100644 --- a/docs_sources/api/models/adab.md +++ b/docs_sources/api/models/adab.md @@ -2,9 +2,9 @@ ----------------- AdaBoost is a meta-estimator that begins by fitting a classifier/regressor on - the original dataset and then fits additional copies of the algorithm on the - same dataset but where the weights of instances are adjusted according to the - error of the current prediction. +the original dataset and then fits additional copies of the algorithm on the +same dataset but where the weights of instances are adjusted according to the +error of the current prediction. Corresponding estimators are: @@ -20,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/e ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `algorithm` parameter is only used with AdaBoostClassifier. * The `loss` parameter is only used with AdaBoostRegressor. * The `random_state` parameter is set equal to that of the trainer. @@ -152,7 +152,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -164,8 +165,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -177,7 +178,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -187,9 +189,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -210,9 +212,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -268,8 +270,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.adab.plot_permutation_importance()` - or `atom.adab.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.adab.plot_permutation_importance()` +or `atom.adab.predict(X)`.The remaining utility methods can be found hereunder:

@@ -308,13 +310,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -333,15 +335,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -356,15 +359,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -373,7 +376,7 @@ Get the scoring for a specific metric. metric: str or None, optional (default=None)
Name of the metric to calculate. Choose from any of sklearn's SCORERS - or one of the following custom metrics (only if classifier): +or one of the following custom metrics (only if classifier):
  • "cm" for the confusion matrix.
  • "tn" for true negatives.
  • @@ -409,7 +412,7 @@ Model's score for the selected metric.
    method save_estimator(filename=None)
    -
    + Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/ard.md b/docs_sources/api/models/ard.md index 7aca841c8..2bcd32321 100644 --- a/docs_sources/api/models/ard.md +++ b/docs_sources/api/models/ard.md @@ -2,9 +2,9 @@ ----------------------------------------- Automatic Relevance Determination is very similar to [Bayesian Ridge](../br), but - can lead to sparser coefficients. Fit the weights of a regression model, using an - ARD prior. The weights of the regression model are assumed to be in Gaussian - distributions. +can lead to sparser coefficients. Fit the weights of a regression model, using an +ARD prior. The weights of the regression model are assumed to be in Gaussian +distributions. Corresponding estimators are: @@ -18,9 +18,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -149,7 +149,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -161,8 +162,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -174,7 +175,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -184,9 +186,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -207,9 +209,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -241,8 +243,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.ard.plot_permutation_importance()` - or `atom.ard.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.ard.plot_permutation_importance()` +or `atom.ard.predict(X)`.The remaining utility methods can be found hereunder:

@@ -276,15 +278,16 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
@@ -299,15 +302,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -340,7 +343,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/bag.md b/docs_sources/api/models/bag.md index 26e3ef569..6f0563c69 100644 --- a/docs_sources/api/models/bag.md +++ b/docs_sources/api/models/bag.md @@ -1,12 +1,13 @@ # Bagging (Bag) --------------- -Bagging uses an ensemble meta-estimator that fits base classifiers/regressors each on - random subsets of the original dataset and then aggregate their individual predictions - (either by voting or by averaging) to form a final prediction. Such a meta-estimator - can typically be used as a way to reduce the variance of a black-box estimator - (e.g., a [decision tree](../tree)), by introducing randomization into its construction - procedure and then making an ensemble out of it. +Bagging uses an ensemble meta-estimator that fits base classifiers/regressors +each on random subsets of the original dataset and then aggregate their +individual predictions (either by voting or by averaging) to form a final +prediction. Such a meta-estimator can typically be used as a way to reduce +the variance of a black-box estimator (e.g., a [decision tree](../tree)), +by introducing randomization into its construction procedure and then +making an ensemble out of it. Corresponding estimators are: @@ -22,9 +23,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/e ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -158,7 +159,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -170,8 +172,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -183,7 +185,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -193,9 +196,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -216,9 +219,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -266,8 +269,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.bag.plot_permutation_importance()` - or `atom.bag.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.bag.plot_permutation_importance()` +or `atom.bag.predict(X)`.The remaining utility methods can be found hereunder:

@@ -306,13 +309,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -331,15 +334,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -354,15 +358,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -407,7 +411,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/bnb.md b/docs_sources/api/models/bnb.md index d614dc4ca..7dafaec1b 100644 --- a/docs_sources/api/models/bnb.md +++ b/docs_sources/api/models/bnb.md @@ -1,10 +1,10 @@ # Bernoulli Naive Bayes (BNB) ----------------------------- -Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate Bernoulli - models. Like [Multinomial Naive bayes (MNB)](../mnb), this classifier is suitable for - discrete data. The difference is that while MNB works with occurrence counts, BNB - is designed for binary/boolean features. +Bernoulli Naive Bayes implements the Naive Bayes algorithm for multivariate +Bernoulli models. Like [Multinomial Naive bayes (MNB)](../mnb), this +classifier is suitable for discrete data. The difference is that while +MNB works with occurrence counts, BNB is designed for binary/boolean features. Corresponding estimators are: @@ -18,9 +18,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/g ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -140,7 +140,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -152,8 +153,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -165,7 +166,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -175,9 +177,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -198,9 +200,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -248,8 +250,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.bnb.plot_permutation_importance()` or `atom.bnb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.bnb.plot_permutation_importance()` or `atom.bnb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -288,13 +290,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -313,15 +315,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -336,15 +339,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -389,7 +392,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/br.md b/docs_sources/api/models/br.md index fa2e1af4e..d97d5f8c5 100644 --- a/docs_sources/api/models/br.md +++ b/docs_sources/api/models/br.md @@ -1,9 +1,9 @@ # Bayesian Ridge (BR) --------------------- -Bayesian regression techniques can be used to include regularization parameters in the - estimation procedure: the regularization parameter is not set in a hard sense but - tuned to the data at hand. +Bayesian regression techniques can be used to include regularization +parameters in the estimation procedure: the regularization parameter +is not set in a hard sense but tuned to the data at hand. Corresponding estimators are: @@ -17,9 +17,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -148,7 +148,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -160,8 +161,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -173,7 +174,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -183,9 +185,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -206,9 +208,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -240,8 +242,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.br.plot_permutation_importance()` - or `atom.br.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.br.plot_permutation_importance()` +or `atom.br.predict(X)`.The remaining utility methods can be found hereunder:

@@ -275,15 +277,16 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
@@ -298,15 +301,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -339,7 +342,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/catb.md b/docs_sources/api/models/catb.md index f311fab3a..c34de7066 100644 --- a/docs_sources/api/models/catb.md +++ b/docs_sources/api/models/catb.md @@ -1,8 +1,8 @@ # CatBoost (CatB) ----------------- -CatBoost is a machine learning method based on gradient boosting over decision trees. - Main advantages of CatBoost: +CatBoost is a machine learning method based on gradient boosting over +decision trees. Main advantages of CatBoost: * Superior quality when compared with other GBDT models on many datasets. * Best in class prediction speed. @@ -17,17 +17,17 @@ Corresponding estimators are: Read more in CatBoost's [documentation](https://catboost.ai/). !!!note - CatBoost allows [early stopping](../../../user_guide/#early-stopping) to stop - the training of unpromising models prematurely! + CatBoost allows [early stopping](../../../user_guide/#early-stopping) + to stop the training of unpromising models prematurely!

## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `bootstrap_type` parameter is set to "Bernoulli" to allow for the `subsample` parameter. * The `num_leaves` and `min_child_samples` parameters are not available for the CPU implementation. @@ -167,7 +167,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -179,8 +180,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -192,8 +193,9 @@ Metric score(s) on the test set.
evals: dict
-Dictionary of the metric calculated during training. The metric is provided by the estimator's - package and is different for every task. Available keys are: +Dictionary of the metric calculated during training. The metric is +provided by the estimator's package and is different for every task. +Available keys are:
  • "metric": Name of the metric.
  • "train": List of scores calculated on the training set.
  • @@ -202,7 +204,8 @@ Dictionary of the metric calculated during training. The metric is provided by t
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -212,9 +215,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -235,9 +238,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -285,8 +288,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.catb.plot_permutation_importance()` or `atom.catb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.catb.plot_permutation_importance()` or `atom.catb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -325,13 +328,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -350,15 +353,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -373,15 +377,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -426,7 +430,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/catnb.md b/docs_sources/api/models/catnb.md index d2ba09651..5259c1b03 100644 --- a/docs_sources/api/models/catnb.md +++ b/docs_sources/api/models/catnb.md @@ -1,7 +1,8 @@ # Categorical Naive Bayes (CatNB) --------------------------------- -Categorical Naive Bayes implements the Naive Bayes algorithm for categorical features. +Categorical Naive Bayes implements the Naive Bayes algorithm for +categorical features. Corresponding estimators are: @@ -15,9 +16,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -138,7 +139,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -150,8 +152,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -163,7 +165,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -173,9 +176,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -196,9 +199,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -246,8 +249,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.catnb.plot_permutation_importance()` or `atom.catnb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.catnb.plot_permutation_importance()` or `atom.catnb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -286,13 +289,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -311,15 +314,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -334,15 +338,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -387,7 +391,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/cnb.md b/docs_sources/api/models/cnb.md index 532e9b334..b0e41ade6 100644 --- a/docs_sources/api/models/cnb.md +++ b/docs_sources/api/models/cnb.md @@ -1,9 +1,9 @@ # Complement Naive Bayes (CNB) -------------------------------- -The Complement Naive Bayes classifier was designed to correct the “severe assumptions” - made by the standard [Multinomial Naive Bayes](../mnb) classifier. It is particularly - suited for imbalanced data sets. +The Complement Naive Bayes classifier was designed to correct the +“severe assumptions” made by the standard [Multinomial Naive Bayes](../mnb) +classifier. It is particularly suited for imbalanced data sets. Corresponding estimators are: @@ -17,9 +17,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -144,7 +144,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -156,8 +157,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -169,7 +170,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -179,9 +181,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -202,9 +204,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -252,8 +254,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.cnb.plot_permutation_importance()` or `atom.cnb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.cnb.plot_permutation_importance()` or `atom.cnb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -292,13 +294,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -317,15 +319,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -340,15 +343,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -393,7 +396,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/en.md b/docs_sources/api/models/en.md index 2a12c8317..af98a1451 100644 --- a/docs_sources/api/models/en.md +++ b/docs_sources/api/models/en.md @@ -15,9 +15,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `random_state` parameter is set equal to that of the trainer.
@@ -139,7 +139,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -151,8 +152,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -164,7 +165,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -174,9 +176,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -197,9 +199,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -231,8 +233,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.en.plot_permutation_importance()` - or `atom.en.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.en.plot_permutation_importance()` + or `atom.en.predict(X)`.The remaining utility methods can be found hereunder:

@@ -266,15 +268,16 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
@@ -289,15 +292,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -330,7 +333,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/et.md b/docs_sources/api/models/et.md index f05c02a0a..452fdfd03 100644 --- a/docs_sources/api/models/et.md +++ b/docs_sources/api/models/et.md @@ -1,9 +1,10 @@ # Extra-Trees (ET) ------------------ -Extra-Trees use a meta estimator that fits a number of randomized decision trees - (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging - to improve the predictive accuracy and control over-fitting. +Extra-Trees use a meta estimator that fits a number of randomized +decision trees (a.k.a. extra-trees) on various sub-samples of the +dataset and uses averaging to improve the predictive accuracy and +control over-fitting. Corresponding estimators are: @@ -19,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/e ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `max_samples` parameter is only used when bootstrap = True. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -176,7 +177,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -188,8 +190,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -201,7 +203,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -211,9 +214,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -234,9 +237,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -284,8 +287,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.et.plot_permutation_importance()` - or `atom.et.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.et.plot_permutation_importance()` +or `atom.et.predict(X)`.The remaining utility methods can be found hereunder:

@@ -324,13 +327,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -349,15 +352,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -372,15 +376,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -425,7 +429,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/gbm.md b/docs_sources/api/models/gbm.md index 612ac89c8..282f3782d 100644 --- a/docs_sources/api/models/gbm.md +++ b/docs_sources/api/models/gbm.md @@ -1,11 +1,12 @@ # Gradient Boosting Machine (GBM) --------------------------------- -A Gradient Boosting Machine builds an additive model in a forward stage-wise - fashion; it allows for the optimization of arbitrary differentiable loss - functions. In each stage `n_classes_` regression trees are fit on the negative - gradient of the binomial or multinomial deviance loss function. Binary - classification is a special case where only a single regression tree is induced. +A Gradient Boosting Machine builds an additive model in a forward +stage-wise fashion; it allows for the optimization of arbitrary +differentiable loss functions. In each stage `n_classes_` regression +trees are fit on the negative gradient of the binomial or multinomial +deviance loss function. Binary classification is a special case where +only a single regression tree is induced. Corresponding estimators are: @@ -21,9 +22,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/e ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * For multiclass classification tasks, the `loss` parameter is always set to "deviance". * The `alpha` parameter is only used when loss = "huber" or "quantile". * The `random_state` parameter is set equal to that of the trainer. @@ -186,7 +187,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -198,8 +200,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -211,7 +213,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -221,9 +224,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -244,9 +247,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -302,8 +305,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.gbm.plot_permutation_importance()` - or `atom.gbm.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.gbm.plot_permutation_importance()` +or `atom.gbm.predict(X)`.The remaining utility methods can be found hereunder:

@@ -342,13 +345,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -367,15 +370,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -390,15 +394,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -443,7 +447,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/gnb.md b/docs_sources/api/models/gnb.md index ea041d4be..d50f14b79 100644 --- a/docs_sources/api/models/gnb.md +++ b/docs_sources/api/models/gnb.md @@ -1,8 +1,9 @@ # Gaussian Naive bayes (GNB) ---------------------------- -Gaussian Naive Bayes implements the Naive Bayes algorithm for classification. The - likelihood of the features is assumed to be Gaussian. +Gaussian Naive Bayes implements the Naive Bayes algorithm for +classification. The likelihood of the features is assumed to +be Gaussian. Corresponding estimators are: @@ -16,9 +17,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * GNB has no parameters to tune with the BO. @@ -113,8 +114,8 @@ Estimator instance fitted on the complete training set. time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -126,7 +127,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -158,9 +160,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -208,8 +210,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.gnb.plot_permutation_importance()` or `atom.gnb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.gnb.plot_permutation_importance()` or `atom.gnb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -248,13 +250,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -273,15 +275,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -296,15 +299,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -349,7 +352,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/gp.md b/docs_sources/api/models/gp.md index d7a5fc78e..25856c3b9 100644 --- a/docs_sources/api/models/gp.md +++ b/docs_sources/api/models/gp.md @@ -1,19 +1,22 @@ # Gaussian Process (GP) ----------------------- -Gaussian Processes are a generic supervised learning method designed to solve - regression and probabilistic classification problems. The advantages of Gaussian processes are: +Gaussian Processes are a generic supervised learning method designed +to solve regression and probabilistic classification problems. The +advantages of Gaussian processes are: * The prediction interpolates the observations. -* The prediction is probabilistic (Gaussian) so that one can compute empirical confidence - intervals and decide based on those if one should refit (online fitting, adaptive fitting) - the prediction in some region of interest. +* The prediction is probabilistic (Gaussian) so that one can compute + empirical confidence intervals and decide based on those if one + should refit (online fitting, adaptive fitting) the prediction in + some region of interest. The disadvantages of Gaussian processes include: -* They are not sparse, i.e. they use the whole samples/features information to perform the prediction. -* They lose efficiency in high dimensional spaces, namely when the number of features - exceeds a few dozens. +* They are not sparse, i.e. they use the whole samples/features + information to perform the prediction. +* They lose efficiency in high dimensional spaces, namely when the + number of features exceeds a few dozens. Corresponding estimators are: @@ -29,9 +32,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/g ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * GP has no parameters to tune with the BO. @@ -125,8 +128,8 @@ Estimator instance fitted on the complete training set. time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -138,7 +141,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -170,9 +174,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -220,8 +224,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.gp.plot_permutation_importance()` or `atom.gp.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.gp.plot_permutation_importance()` or `atom.gp.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -260,13 +264,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -285,15 +289,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -308,15 +313,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -361,7 +366,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/knn.md b/docs_sources/api/models/knn.md index 4bd8ffd4a..a12de71c0 100644 --- a/docs_sources/api/models/knn.md +++ b/docs_sources/api/models/knn.md @@ -1,9 +1,10 @@ # K-Nearest Neighbors (KNN) --------------------------- -K-Nearest Neighbors, as the name clearly indicates, implements the k-nearest - neighbors vote. For regression, the target is predicted by local interpolation - of the targets associated of the nearest neighbors in the training set. +K-Nearest Neighbors, as the name clearly indicates, implements the +k-nearest neighbors vote. For regression, the target is predicted +by local interpolation of the targets associated of the nearest +neighbors in the training set. Corresponding estimators are: @@ -19,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` parameter is set equal to that of the trainer.
@@ -153,7 +154,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -165,8 +167,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -178,7 +180,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -188,9 +191,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -211,9 +214,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -261,8 +264,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.knn.plot_permutation_importance()` - or `atom.knn.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.knn.plot_permutation_importance()` +or `atom.knn.predict(X)`.The remaining utility methods can be found hereunder:

@@ -301,13 +304,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -326,15 +329,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -349,15 +353,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -402,7 +406,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/ksvm.md b/docs_sources/api/models/ksvm.md index 7e7675985..55ab9634e 100644 --- a/docs_sources/api/models/ksvm.md +++ b/docs_sources/api/models/ksvm.md @@ -2,10 +2,10 @@ ------------------- The implementation of the Kernel (non-linear) Support Vector Machine is - based on libsvm. The fit time scales at least quadratically with the number - of samples and may be impractical beyond tens of thousands of samples. For - large datasets consider using a [Linear Support Vector Machine](../lsvm) - or a [Stochastic Gradient descent](../sgd) model instead. +based on libsvm. The fit time scales at least quadratically with the +number of samples and may be impractical beyond tens of thousands of +samples. For large datasets consider using a [Linear Support Vector Machine](../lsvm) +or a [Stochastic Gradient descent](../sgd) model instead. The multiclass support is handled according to a one-vs-one scheme. @@ -23,9 +23,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/s ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `degree` parameter is only used when kernel = "poly". * The `gamma` parameter is always set to "scale" when kernel = "poly". * The `coef0` parameter is only used when kernel = "rbf". @@ -164,7 +164,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -176,8 +177,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -189,7 +190,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -199,9 +201,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -222,9 +224,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -264,8 +266,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.ksvm.plot_permutation_importance()` or `atom.ksvm.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.ksvm.plot_permutation_importance()` or `atom.ksvm.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -304,13 +306,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -329,15 +331,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -352,15 +355,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -405,7 +408,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/lasso.md b/docs_sources/api/models/lasso.md index 1df4838ec..82766fbd6 100644 --- a/docs_sources/api/models/lasso.md +++ b/docs_sources/api/models/lasso.md @@ -15,9 +15,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `random_state` parameter is set equal to that of the trainer.
@@ -135,7 +135,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -147,8 +148,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -160,7 +161,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -170,9 +172,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -193,9 +195,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -227,8 +229,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.lasso.plot_permutation_importance()` - or `atom.lasso.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.lasso.plot_permutation_importance()` +or `atom.lasso.predict(X)`.The remaining utility methods can be found hereunder:

@@ -262,15 +264,16 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
@@ -285,15 +288,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -326,7 +329,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/lda.md b/docs_sources/api/models/lda.md index 3e7e2edd7..adbb25cb4 100644 --- a/docs_sources/api/models/lda.md +++ b/docs_sources/api/models/lda.md @@ -1,10 +1,10 @@ # Linear Discriminant Analysis (LDA) ------------------------------------ -Linear Discriminant Analysis is a classifier with a linear decision boundary, - generated by fitting class conditional densities to the data and using Bayes’ rule. - The model fits a Gaussian density to each class, assuming that all classes share - the same covariance matrix. +Linear Discriminant Analysis is a classifier with a linear decision +boundary, generated by fitting class conditional densities to the data +and using Bayes’ rule. The model fits a Gaussian density to each class, +assuming that all classes share the same covariance matrix. Corresponding estimators are: @@ -18,9 +18,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `shrinkage` parameter is not used when solver = "svd".
@@ -142,7 +142,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -154,8 +155,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -167,7 +168,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -177,9 +179,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -200,9 +202,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -258,8 +260,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.lda.plot_permutation_importance()` - or `atom.lda.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.lda.plot_permutation_importance()` +or `atom.lda.predict(X)`.The remaining utility methods can be found hereunder:

@@ -298,13 +300,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -323,15 +325,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -346,15 +349,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -399,7 +402,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/lgb.md b/docs_sources/api/models/lgb.md index 5f31c4ff7..9acad7b43 100644 --- a/docs_sources/api/models/lgb.md +++ b/docs_sources/api/models/lgb.md @@ -1,8 +1,9 @@ # LightGBM (LGB) ---------------- -LightGBM is a gradient boosting model that uses tree based learning algorithms. It is - designed to be distributed and efficient with the following advantages: +LightGBM is a gradient boosting model that uses tree based learning +algorithms. It is designed to be distributed and efficient with the +following advantages: * Faster training speed and higher efficiency. * Lower memory usage. @@ -19,17 +20,17 @@ Corresponding estimators are: Read more in LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/index.html). !!!note - LightGBM allows [early stopping](../../../user_guide/#early-stopping) to stop - the training of unpromising models prematurely! + LightGBM allows [early stopping](../../../user_guide/#early-stopping) + to stop the training of unpromising models prematurely!

## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -182,7 +183,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -194,8 +196,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -207,8 +209,9 @@ Metric score(s) on the test set.
evals: dict
-Dictionary of the metric calculated during training. The metric is provided by the estimator's - package and is different for every task. Available keys are: +Dictionary of the metric calculated during training. The metric is +provided by the estimator's package and is different for every task. +Available keys are:
  • "metric": Name of the metric.
  • "train": List of scores calculated on the training set.
  • @@ -217,7 +220,8 @@ Dictionary of the metric calculated during training. The metric is provided by t
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -227,9 +231,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -250,9 +254,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -300,8 +304,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.lgb.plot_permutation_importance()` or `atom.lgb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.lgb.plot_permutation_importance()` or `atom.lgb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -340,13 +344,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -365,15 +369,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -388,15 +393,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -441,7 +446,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/lr.md b/docs_sources/api/models/lr.md index 09801b0c0..28698190e 100644 --- a/docs_sources/api/models/lr.md +++ b/docs_sources/api/models/lr.md @@ -1,11 +1,12 @@ # Logistic regression (LR) -------------------------- -Logistic regression, despite its name, is a linear model for classification rather - than regression. Logistic regression is also known in the literature as logit - regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. - In this model, the probabilities describing the possible outcomes of a single trial - are modeled using a logistic function. +Logistic regression, despite its name, is a linear model for +classification rather than regression. Logistic regression is also +known in the literature as logit regression, maximum-entropy +classification (MaxEnt) or the log-linear classifier. In this model, +the probabilities describing the possible outcomes of a single trial +are modeled using a logistic function. Corresponding estimators are: @@ -19,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `penalty` parameter is automatically set to "l2" when penalty = "none" and solver = "liblinear". * The `penalty` parameter is automatically set to "l2" when penalty = "l1" and solver != "liblinear" or "saga". * The `penalty` parameter is automatically set to "l2" when penalty = "elasticnet" and solver != "saga". @@ -161,7 +162,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -173,8 +175,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -186,7 +188,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -196,9 +199,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -219,9 +222,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -277,8 +280,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.lr.plot_permutation_importance()` - or `atom.lr.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.lr.plot_permutation_importance()` +or `atom.lr.predict(X)`.The remaining utility methods can be found hereunder:

@@ -317,13 +320,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -342,15 +345,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -365,15 +369,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -418,7 +422,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/lsvm.md b/docs_sources/api/models/lsvm.md index 6a28b73e0..47d592cc3 100644 --- a/docs_sources/api/models/lsvm.md +++ b/docs_sources/api/models/lsvm.md @@ -1,9 +1,10 @@ # Linear-SVM (lSVM) ------------------- -Similar to [Kernel-SVM](../ksvm) but with a linear kernel. Implemented in terms of - liblinear rather than libsvm, so it has more flexibility in the choice of penalties - and loss functions and should scale better to large numbers of samples. +Similar to [Kernel-SVM](../ksvm) but with a linear kernel. Implemented +in terms of liblinear rather than libsvm, so it has more flexibility +in the choice of penalties and loss functions and should scale better +to large numbers of samples. The multiclass support is handled according to a one-vs-rest scheme. @@ -21,9 +22,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/s ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `penalty` parameter is only used with LinearSVC. * The `penalty` parameter is always set to "l2" when loss = "hinge". * The `dual` parameter is automatically set to False when penalty = "l1" and loss = "squared_hinge". @@ -155,7 +156,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -167,8 +169,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -180,7 +182,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -190,9 +193,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -213,9 +216,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -255,8 +258,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.lsvm.plot_permutation_importance()` or `atom.lsvm.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.lsvm.plot_permutation_importance()` or `atom.lsvm.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -295,13 +298,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -320,15 +323,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -343,15 +347,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -396,7 +400,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/mlp.md b/docs_sources/api/models/mlp.md index 1b7063b78..052ee5ba0 100644 --- a/docs_sources/api/models/mlp.md +++ b/docs_sources/api/models/mlp.md @@ -1,11 +1,12 @@ # Multi-layer Perceptron (MLP) ------------------------------ -Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function - by training on a dataset. Given a set of features and a target, it can learn a - non-linear function approximator for either classification or regression. It is - different from logistic regression, in that between the input and the output layer, - there can be one or more non-linear layers, called hidden layers. +Multi-layer Perceptron (MLP) is a supervised learning algorithm that +learns a function by training on a dataset. Given a set of features +and a target, it can learn a non-linear function approximator for +either classification or regression. It is different from logistic +regression, in that between the input and the output layer, there can +be one or more non-linear layers, called hidden layers. Corresponding estimators are: @@ -21,9 +22,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The MLP optimizes between one and three hidden layers with the BO. For more layers, use `est_params`. * The `learning_rate` and `power_t` parameters are only used when solver = "lbfgs". * The `learning_rate_init` parameter is only used when solver != "lbfgs". @@ -176,7 +177,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -188,8 +190,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -201,8 +203,9 @@ Metric score(s) on the test set.
evals: dict
-Dictionary of the metric calculated during training. The metric is provided by the estimator's - package and is different for every task. Available keys are: +Dictionary of the metric calculated during training. The metric is +provided by the estimator's package and is different for every task. +Available keys are:
  • "metric": Name of the metric.
  • "train": List of scores calculated on the training set.
  • @@ -211,7 +214,8 @@ Dictionary of the metric calculated during training. The metric is provided by t
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -221,9 +225,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -244,9 +248,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -294,8 +298,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.mlp.plot_permutation_importance()` or `atom.mlp.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.mlp.plot_permutation_importance()` or `atom.mlp.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -334,13 +338,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -359,15 +363,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -382,15 +387,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -435,7 +440,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/mnb.md b/docs_sources/api/models/mnb.md index 1371e829b..061db6d70 100644 --- a/docs_sources/api/models/mnb.md +++ b/docs_sources/api/models/mnb.md @@ -1,10 +1,11 @@ # Multinomial Naive Bayes (MNB) ------------------------------- -Multinomial Naive Bayes implements the Naive Bayes algorithm for multinomially - distributed data, and is one of the two classic Naive Bayes variants used in text - classification (where the data are typically represented as word vector counts, - although tf-idf vectors are also known to work well in practice). +Multinomial Naive Bayes implements the Naive Bayes algorithm for +multinomially distributed data, and is one of the two classic Naive +Bayes variants used in text classification (where the data are +typically represented as word vector counts, although tf-idf vectors +are also known to work well in practice). Corresponding estimators are: @@ -18,9 +19,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -141,7 +142,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -153,8 +155,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -166,7 +168,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -176,9 +179,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -199,9 +202,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -249,8 +252,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.mnb.plot_permutation_importance()` or `atom.mnb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.mnb.plot_permutation_importance()` or `atom.mnb.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -289,13 +292,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -314,15 +317,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -337,15 +341,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -390,7 +394,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/ols.md b/docs_sources/api/models/ols.md index 2b1d9aa1a..1554124df 100644 --- a/docs_sources/api/models/ols.md +++ b/docs_sources/api/models/ols.md @@ -1,10 +1,10 @@ # Ordinary Least Squares (OLS) ------------------------------ -Ordinary Least Squares is just linear regression without any regularization. It fits - a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of - squares between the observed targets in the dataset, and the targets predicted by - the linear approximation. +Ordinary Least Squares is just linear regression without any +regularization. It fits a linear model with coefficients w = (w1, …, wp) +to minimize the residual sum of squares between the observed targets in +the dataset, and the targets predicted by the linear approximation. Corresponding estimators are: @@ -18,9 +18,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` parameter is set equal to that of the trainer. * OLS has no parameters to tune with the BO. @@ -115,8 +115,8 @@ Estimator instance fitted on the complete training set. time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -128,7 +128,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -160,9 +161,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -194,8 +195,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.ols.plot_permutation_importance()` or `atom.ols.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.ols.plot_permutation_importance()` or `atom.ols.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -229,15 +230,16 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
@@ -252,15 +254,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -292,7 +294,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/pa.md b/docs_sources/api/models/pa.md index cf06deae1..d0828369d 100644 --- a/docs_sources/api/models/pa.md +++ b/docs_sources/api/models/pa.md @@ -1,9 +1,10 @@ # Passive Aggressive (PA) ------------------------- -The passive-aggressive algorithms are a family of algorithms for large-scale learning. - They are similar to the Perceptron in that they do not require a learning rate. However, - contrary to the Perceptron, they include a regularization parameter C. +The passive-aggressive algorithms are a family of algorithms for +large-scale learning. They are similar to the Perceptron in that they +do not require a learning rate. However, contrary to the Perceptron, +they include a regularization parameter C. Corresponding estimators are: @@ -19,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -151,7 +152,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -163,8 +165,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -176,7 +178,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -186,9 +189,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -209,9 +212,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -251,8 +254,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.pa.plot_permutation_importance()` or `atom.pa.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.pa.plot_permutation_importance()` or `atom.pa.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -291,13 +294,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -316,15 +319,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -339,15 +343,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -392,7 +396,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/qda.md b/docs_sources/api/models/qda.md index 409dde99b..d630ab921 100644 --- a/docs_sources/api/models/qda.md +++ b/docs_sources/api/models/qda.md @@ -1,10 +1,10 @@ # Quadratic Discriminant Analysis (QDA) --------------------------------------- -Linear Discriminant Analysis is a classifier with a quadratic decision boundary, - generated by fitting class conditional densities to the data and using Bayes’ rule. - The model fits a Gaussian density to each class, assuming that all classes share - the same covariance matrix. +Linear Discriminant Analysis is a classifier with a quadratic decision +boundary, generated by fitting class conditional densities to the data +and using Bayes’ rule. The model fits a Gaussian density to each class, +assuming that all classes share the same covariance matrix. Corresponding estimators are: @@ -18,9 +18,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them.
@@ -137,7 +137,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -149,8 +150,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -162,7 +163,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -172,9 +174,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -195,9 +197,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -253,8 +255,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.qda.plot_permutation_importance()` - or `atom.qda.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.qda.plot_permutation_importance()` +or `atom.qda.predict(X)`.The remaining utility methods can be found hereunder:

@@ -293,13 +295,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset.
@@ -318,15 +320,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -341,15 +344,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -394,7 +397,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/rf.md b/docs_sources/api/models/rf.md index f18e2119a..74f1f6aab 100644 --- a/docs_sources/api/models/rf.md +++ b/docs_sources/api/models/rf.md @@ -1,10 +1,11 @@ # Random Forest (RF) -------------------- -Random forests are an ensemble learning method that operate by constructing a multitude - of decision trees at training time and outputting the class that is the mode of the - classes (classification) or mean prediction (regression) of the individual trees. - Random forests correct for decision trees" habit of overfitting to their training set. +Random forests are an ensemble learning method that operate by +constructing a multitude of decision trees at training time and +outputting the class that is the mode of the classes (classification) +or mean prediction (regression) of the individual trees. Random forests +correct for decision trees" habit of overfitting to their training set. Corresponding estimators are: @@ -20,9 +21,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/e ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `max_samples` parameter is only used when bootstrap = True. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -192,7 +193,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -204,8 +206,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -217,7 +219,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -235,9 +238,9 @@ Standard deviation of the bagging's results. List of values for multi-metric run ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -285,8 +288,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.rf.plot_permutation_importance()` or `atom.rf.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.rf.plot_permutation_importance()` or `atom.rf.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -325,13 +328,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -350,15 +353,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -373,15 +377,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -426,7 +430,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/ridge.md b/docs_sources/api/models/ridge.md index 9f43df347..063b58407 100644 --- a/docs_sources/api/models/ridge.md +++ b/docs_sources/api/models/ridge.md @@ -17,9 +17,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/l ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `random_state` parameter is set equal to that of the trainer.
@@ -137,7 +137,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -149,8 +150,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -162,7 +163,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -172,9 +174,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -195,9 +197,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -245,8 +247,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the model, e.g. `atom.ridge.plot_permutation_importance()` - or `atom.ridge.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the model, e.g. `atom.ridge.plot_permutation_importance()` +or `atom.ridge.predict(X)`.The remaining utility methods can be found hereunder:

@@ -285,13 +287,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -310,15 +312,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -333,15 +336,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -386,7 +389,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/rnn.md b/docs_sources/api/models/rnn.md index 171ae4d09..ab11d9e26 100644 --- a/docs_sources/api/models/rnn.md +++ b/docs_sources/api/models/rnn.md @@ -1,10 +1,10 @@ # Radius Nearest Neighbors (RNN) -------------------------------- -Radius Nearest Neighbors implements the nearest neighbors vote, where the neighbors - are selected from within a given radius. For regression, the target is predicted - by local interpolation of the targets associated of the nearest neighbors in the - training set. +Radius Nearest Neighbors implements the nearest neighbors vote, where +the neighbors are selected from within a given radius. For regression, +the target is predicted by local interpolation of the targets associated +of the nearest neighbors in the training set. Corresponding estimators are: @@ -20,9 +20,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `outlier_label` parameter is set by default to "most_frequent" to avoid errors when encountering outliers. * The `n_jobs` parameter is set equal to that of the trainer. @@ -34,12 +34,13 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/n
Real(min(distances), max(distances), name="radius") -Since the optimal radius depends hugely on the data, ATOM's RNN implementation - doesn't use sklearn's default radius of 1, but instead calculates the [minkowsky - distance](https://en.wikipedia.org/wiki/Minkowski_distance) between 10% of random - samples in the training set and uses the mean of those distances as default radius. - The lower and upper bounds of the radius" dimensions for the BO are given by the - minimum and maximum value of the calculated distances. +Since the optimal radius depends hugely on the data, ATOM's RNN +implementation doesn't use sklearn's default radius of 1, but instead +calculates the [minkowsky distance](https://en.wikipedia.org/wiki/Minkowski_distance) +between 10% of random samples in the training set and uses the mean of +those distances as default radius. The lower and upper bounds of the +radius' dimensions for the BO are given by the minimum and maximum +value of the calculated distances.
weights: str, default="uniform"
@@ -162,7 +163,8 @@ Dictionary of the best combination of hyperparameters found by the BO.
estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -174,8 +176,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -187,7 +189,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -197,9 +200,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -220,9 +223,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -270,8 +273,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.rnn.plot_permutation_importance()` - or `atom.rnn.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.rnn.plot_permutation_importance()` +or `atom.rnn.predict(X)`.The remaining utility methods can be found hereunder:

@@ -310,13 +313,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -335,15 +338,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -358,15 +362,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -411,7 +415,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/sgd.md b/docs_sources/api/models/sgd.md index 593fd5131..d160248fd 100644 --- a/docs_sources/api/models/sgd.md +++ b/docs_sources/api/models/sgd.md @@ -1,10 +1,11 @@ # Stochastic Gradient Descent (SGD) ----------------------------------- -Stochastic Gradient Descent is a simple yet very efficient approach to fitting linear - classifiers and regressors under convex loss functions. Even though SGD has been - around in the machine learning community for a long time, it has received a - considerable amount of attention just recently in the context of large-scale learning. +Stochastic Gradient Descent is a simple yet very efficient approach to +fitting linear classifiers and regressors under convex loss functions. +Even though SGD has been around in the machine learning community for a +long time, it has received a considerable amount of attention just +recently in the context of large-scale learning. Corresponding estimators are: @@ -20,9 +21,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/s ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `l1_ratio` parameter is only used when penalty = "elasticnet". * The `eta0` parameter is only used when learning_rate != "optimal". * The `n_jobs` and `random_state` parameters are set equal to those of the @@ -178,7 +179,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -190,8 +192,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -203,7 +205,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -213,9 +216,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -236,9 +239,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -278,8 +281,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.sgd.plot_permutation_importance()` or `atom.sgd.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.sgd.plot_permutation_importance()` or `atom.sgd.predict(X)`. +The remaining utility methods can be found hereunder:

@@ -318,13 +321,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -343,15 +346,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -366,15 +370,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -419,7 +423,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/tree.md b/docs_sources/api/models/tree.md index c8dff58de..bc568c7a8 100644 --- a/docs_sources/api/models/tree.md +++ b/docs_sources/api/models/tree.md @@ -17,9 +17,9 @@ Read more in sklearn's [documentation](https://scikit-learn.org/stable/modules/t ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `random_state` parameter is set equal to that of the trainer.
@@ -164,7 +164,8 @@ Dictionary of the best combination of hyperparameters found by the BO. estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -176,8 +177,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -189,7 +190,8 @@ Metric score(s) on the test set.
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -199,9 +201,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -222,9 +224,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory.
@@ -272,8 +274,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.tree.plot_permutation_importance()` - or `atom.tree.predict(X)`. The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.tree.plot_permutation_importance()` +or `atom.tree.predict(X)`.The remaining utility methods can be found hereunder:

@@ -312,13 +314,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
method calibrate(**kwargs)
-
-Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
@@ -337,15 +339,16 @@ test set. Use this only if you have another, independent set for testing.
method delete()
-
+ Delete the model from the trainer.


method rename(name=None)
-
-Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
Parameters:
@@ -360,15 +363,15 @@ New tag for the model. If None, the tag is removed.
method reset_predictions()
-
+ Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


method scoring(metric=None, dataset="test", **kwargs)
-
+ Get the scoring for a specific metric.
Parameters:
@@ -413,7 +416,7 @@ Model's score for the selected metric.
method save_estimator(filename=None)
-
+ Save the estimator to a pickle file.
diff --git a/docs_sources/api/models/xgb.md b/docs_sources/api/models/xgb.md index 1a503f420..6f81c0bae 100644 --- a/docs_sources/api/models/xgb.md +++ b/docs_sources/api/models/xgb.md @@ -1,9 +1,10 @@ # XGBoost (XGB) --------------- -XGBoost is an optimized distributed gradient boosting model designed to be highly - efficient, flexible and portable. XGBoost provides a parallel tree boosting that - solve many data science problems in a fast and accurate way. +XGBoost is an optimized distributed gradient boosting model designed to +be highly efficient, flexible and portable. XGBoost provides a parallel +tree boosting that solve many data science problems in a fast and +accurate way. Corresponding estimators are: @@ -15,8 +16,8 @@ Corresponding estimators are: Read more in XGBoost's [documentation](https://xgboost.readthedocs.io/en/latest/index.html). !!!note - XGBoost allows [early stopping](../../../user_guide/#early-stopping) to stop - the training of unpromising models prematurely! + XGBoost allows [early stopping](../../../user_guide/#early-stopping) + to stop the training of unpromising models prematurely! @@ -24,9 +25,9 @@ Read more in XGBoost's [documentation](https://xgboost.readthedocs.io/en/latest/ ## Hyperparameters ------------------ -* By default, the estimator adopts the default parameters provided by its package. - See the [user guide](../../../user_guide/#parameter-customization) on how to - customize them. +* By default, the estimator adopts the default parameters provided by + its package. See the [user guide](../../../user_guide/#parameter-customization) + on how to customize them. * The `n_jobs` and `random_state` parameters are set equal to those of the trainer. @@ -81,12 +82,74 @@ Categorical([0, 0.01, 0.1, 1, 10, 100], name="reg_lambda") ### Data attributes -You can use the same [data attributes](../../ATOM/atomclassifier#data-attributes) - as the trainers to check the dataset that was used to fit a particular - model. These can differ from each other if the model needs scaled features and the - data wasn't already scaled. Note that, unlike with the `training` instances, these - attributes not be updated (i.e. they have no `@setter`). -

+
+ + + +
Attributes: +dataset: pd.DataFrame +
+Complete dataset in the pipeline. +
+train: pd.DataFrame +
+Training set. +
+test: pd.DataFrame +
+Test set. +
+X: pd.DataFrame +
+Feature set. +
+y: pd.Series +
+Target column. +
+X_train: pd.DataFrame +
+Training features. +
+y_train: pd.Series +
+Training target. +
+X_test: pd.DataFrame +
+Test features. +
+y_test: pd.Series +
+Test target. +
+shape: tuple +
+Dataset's shape: (n_rows x n_columns) or +(n_rows, (shape_sample), n_cols) for deep learning datasets. +
+columns: list +
+Names of the columns in the dataset. +
+n_columns: int +
+Number of columns in the dataset. +
+features: list +
+Names of the features in the dataset. +
+n_features: int +
+Number of features in the dataset. +
+target: str +
+Name of the target column. +
+
+
### Utility attributes @@ -113,7 +176,8 @@ Dictionary of the best combination of hyperparameters found by the BO.
estimator: class
-Estimator instance with the best combination of hyperparameters fitted on the complete training set. +Estimator instance with the best combination of hyperparameters fitted +on the complete training set.
time_bo: str
@@ -125,8 +189,8 @@ Best metric score(s) on the BO.
time_fit: str
-Time it took to train the model on the complete training set and calculate the - metric(s) on the test set. +Time it took to train the model on the complete training set and +calculate the metric(s) on the test set.
metric_train: float or list
@@ -138,8 +202,9 @@ Metric score(s) on the test set.
evals: dict
-Dictionary of the metric calculated during training. The metric is provided by the estimator's - package and is different for every task. Available keys are: +Dictionary of the metric calculated during training. The metric is +provided by the estimator's package and is different for every task. +Available keys are:
  • "metric": Name of the metric.
  • "train": List of scores calculated on the training set.
  • @@ -148,7 +213,8 @@ Dictionary of the metric calculated during training. The metric is provided by t
metric_bagging: list
-Bagging's results with shape=(bagging,) for single-metric runs and shape=(metric, bagging) for multi-metric runs. +Bagging's results with shape=(bagging,) for single-metric runs and +shape=(metric, bagging) for multi-metric runs.
mean_bagging: float or list
@@ -158,9 +224,9 @@ Mean of the bagging's results. List of values for multi-metric runs.
Standard deviation of the bagging's results. List of values for multi-metric runs.
-results: pd.DataFrame +results: pd.Series
-Dataframe of the training results with the model acronym as index. Columns can include: +Series of the training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • @@ -181,9 +247,9 @@ Dataframe of the training results with the model acronym as index. Columns can i ### Prediction attributes -The prediction attributes are not calculated until the attribute is called for the - first time. This mechanism avoids having to calculate attributes that are never - used, saving time and memory. +The prediction attributes are not calculated until the attribute is +called for the first time. This mechanism avoids having to calculate +attributes that are never used, saving time and memory. @@ -231,8 +297,8 @@ Model's score on the test set. ---------- The majority of the [plots](../../../user_guide/#plots) and [prediction methods](../../../user_guide/#predicting) - can be called directly from the models, e.g. `atom.xgb.plot_permutation_importance()` or `atom.xgb.predict(X)`. - The remaining utility methods can be found hereunder: +can be called directly from the models, e.g. `atom.xgb.plot_permutation_importance()` or `atom.xgb.predict(X)`. +The remaining utility methods can be found hereunder:

    @@ -271,13 +337,13 @@ The majority of the [plots](../../../user_guide/#plots) and [prediction methods]
    method calibrate(**kwargs)
    -
    -Applies probability calibration on the estimator. The calibration is done using the - [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) - class from sklearn. The calibrator is trained via cross-validation on a subset - of the training data, using the rest to fit the calibrator. The new classifier will - replace the `estimator` attribute. After calibrating, all prediction attributes will - reset. Only if classifier. + +Applies probability calibration on the estimator. The calibration is done +using the [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) +class from sklearn. The calibrator is trained via cross-validation on a +subset of the training data, using the rest to fit the calibrator. The new +classifier will replace the `estimator` attribute. After calibrating, all +prediction attributes will reset. Only if classifier.
    @@ -296,15 +362,16 @@ test set. Use this only if you have another, independent set for testing.
    method delete()
    -
    + Delete the model from the trainer.


    method rename(name=None)
    -
    -Change the model's tag. Note that the acronym always stays at the beginning of the model's name. + +Change the model's tag. Note that the acronym always stays at the +beginning of the model's name.
    Parameters:
    @@ -319,15 +386,15 @@ New tag for the model. If None, the tag is removed.
    method reset_predictions()
    -
    + Clear all the [prediction attributes](../../../user_guide/#predicting). - Use this method to free some memory before saving the model. +Use this method to free some memory before saving the model.


    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Get the scoring for a specific metric.
    Parameters:
    @@ -372,7 +439,7 @@ Model's score for the selected metric.
    method save_estimator(filename=None)
    -
    + Save the estimator to a pickle file.
    diff --git a/docs_sources/api/plots/bar_plot.md b/docs_sources/api/plots/bar_plot.md index 28660b5ad..643a8df61 100644 --- a/docs_sources/api/plots/bar_plot.md +++ b/docs_sources/api/plots/bar_plot.md @@ -3,7 +3,7 @@
    method bar_plot(models=None, index=None, show=None, target=1,
                     title=None, figsize=None, filename=None, display=True, **kwargs)
    -
    + Plot SHAP's bar plot. Create a bar plot of a set of SHAP values. If a single sample is passed, then the SHAP values are plotted. If many samples are passed, then the mean absolute value for each feature @@ -15,9 +15,9 @@ column is plotted. Read more about SHAP plots in the
    models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: int, tuple, slice or None, optional (default=None)
    diff --git a/docs_sources/api/plots/beeswarm_plot.md b/docs_sources/api/plots/beeswarm_plot.md index 17b7dd4eb..28ea0749b 100644 --- a/docs_sources/api/plots/beeswarm_plot.md +++ b/docs_sources/api/plots/beeswarm_plot.md @@ -3,7 +3,7 @@
    method beeswarm_plot(models=None, index=None, show=None, target=1,
                          title=None, figsize=None, filename=None, display=True, **kwargs)
    -
    + Plot SHAP's beeswarm plot. The plot is colored by feature values. Read more about SHAP plots in the [user guide](../../../user_guide/#shap). @@ -12,9 +12,9 @@ Read more about SHAP plots in the [user guide](../../../user_guide/#shap).
    models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: tuple, slice or None, optional (default=None)
    diff --git a/docs_sources/api/plots/decision_plot.md b/docs_sources/api/plots/decision_plot.md index 96fd1440e..5ee405707 100644 --- a/docs_sources/api/plots/decision_plot.md +++ b/docs_sources/api/plots/decision_plot.md @@ -3,7 +3,7 @@
    method decision_plot(models=None, index=None, show=None, target=1,
                          title=None, figsize=None, filename=None, display=True, **kwargs)
    -
    + Plot SHAP's decision plot. Visualize model decisions using cumulative SHAP values. Each plotted line explains a single model prediction. If a single prediction is plotted, feature values will be printed in the plot (if supplied). If multiple @@ -16,9 +16,9 @@ SHAP plots in the [user guide](../../../user_guide/#shap).
    models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: int, tuple, slice or None, optional (default=None)
    @@ -31,8 +31,8 @@ Number of features (ordered by importance) to show in the plot. None to show all
    target: int or str, optional (default=1)
    -Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
    title: str or None, optional (default=None)
    diff --git a/docs_sources/api/plots/force_plot.md b/docs_sources/api/plots/force_plot.md index 29b12351d..d39515c49 100644 --- a/docs_sources/api/plots/force_plot.md +++ b/docs_sources/api/plots/force_plot.md @@ -3,7 +3,7 @@
    method force_plot(models=None, index=None, target=1,
                       title=None, figsize=(14, 6), filename=None, display=True, **kwargs)
    -
    + Plot SHAP's force plot. Visualize the given SHAP values with an additive force layout. Note that by default this plot will render using javascript. For a regular figure use `matplotlib=True` (this option is only available @@ -15,9 +15,9 @@ when only a single sample is plotted). Read more about SHAP plots in the
    models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: int, tuple, slice or None, optional (default=None)
    @@ -26,8 +26,8 @@ n until m. If None, it selects all rows in the test set.
    target: int or str, optional (default=1)
    -Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
    title: str or None, optional (default=None)
    diff --git a/docs_sources/api/plots/heatmap_plot.md b/docs_sources/api/plots/heatmap_plot.md index e97356364..2868eecd6 100644 --- a/docs_sources/api/plots/heatmap_plot.md +++ b/docs_sources/api/plots/heatmap_plot.md @@ -3,7 +3,7 @@
    method heatmap_plot(models=None, index=None, show=None, target=1,
                         title=None, figsize=(8, 6), filename=None, display=True, **kwargs)
    -
    + Plot SHAP's heatmap plot. This plot is designed to show the population substructure of a dataset using supervised clustering and a heatmap. Supervised clustering involves clustering data points not by their original @@ -15,9 +15,9 @@ feature values but by their explanations. Read more about SHAP plots in the
    models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: tuple, slice or None, optional (default=None)
    @@ -31,8 +31,8 @@ Number of features (ordered by importance) to show in the plot. None to show all
    target: int or str, optional (default=1)
    -Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
    title: str or None, optional (default=None)
    diff --git a/docs_sources/api/plots/plot_bo.md b/docs_sources/api/plots/plot_bo.md index 6bd2b1e78..5e21cc78e 100644 --- a/docs_sources/api/plots/plot_bo.md +++ b/docs_sources/api/plots/plot_bo.md @@ -2,7 +2,7 @@ ---------
    method plot_bo(models=None, metric=0, title=None, figsize=(10, 8), filename=None, display=True)
    -
    + Plot the bayesian optimization scoring. Only for models that ran the hyperparameter optimization. This is the same plot as the one produced by `bo_params={"plot_bo": True}` while running the optimization. Creates a canvas with two plots: the first plot shows diff --git a/docs_sources/api/plots/plot_calibration.md b/docs_sources/api/plots/plot_calibration.md index 2b95d95cb..200e38169 100644 --- a/docs_sources/api/plots/plot_calibration.md +++ b/docs_sources/api/plots/plot_calibration.md @@ -2,7 +2,7 @@ ------------------
    method plot_calibration(models=None, n_bins=10, title=None, figsize=(10, 10), filename=None, display=True)
    -
    + Plot the calibration curve for a binary classifier. Well calibrated classifiers are probabilistic classifiers for which the output of the `predict_proba` method can be directly interpreted as a diff --git a/docs_sources/api/plots/plot_components.md b/docs_sources/api/plots/plot_components.md index edefe891f..6c8c0a3a4 100644 --- a/docs_sources/api/plots/plot_components.md +++ b/docs_sources/api/plots/plot_components.md @@ -2,7 +2,7 @@ -----------------
    method plot_components(show=None, title=None, figsize=None, filename=None, display=True)
    -
    + Plot the explained variance ratio per components. Only available if PCA was applied on the data. diff --git a/docs_sources/api/plots/plot_confusion_matrix.md b/docs_sources/api/plots/plot_confusion_matrix.md index ed3432fa6..f96612eb6 100644 --- a/docs_sources/api/plots/plot_confusion_matrix.md +++ b/docs_sources/api/plots/plot_confusion_matrix.md @@ -3,7 +3,7 @@
    method plot_confusion_matrix(models=None, dataset="test", normalize=False,
                                  title=None, figsize=None, filename=None, display=True)
    -
    + Plot a model's confusion matrix. Only for classification tasks. * For 1 model: plot the confusion matrix in a heatmap. diff --git a/docs_sources/api/plots/plot_correlation.md b/docs_sources/api/plots/plot_correlation.md index 0f2ffe314..bc1f2a22d 100644 --- a/docs_sources/api/plots/plot_correlation.md +++ b/docs_sources/api/plots/plot_correlation.md @@ -3,7 +3,7 @@
    method plot_correlation(columns=None, method="pearson", title=None, figsize=(8, 7), filename=None, display=True)
    -
    + Plot the data's correlation matrix.
    diff --git a/docs_sources/api/plots/plot_distribution.md b/docs_sources/api/plots/plot_distribution.md index c2ce93071..f46c5a024 100644 --- a/docs_sources/api/plots/plot_distribution.md +++ b/docs_sources/api/plots/plot_distribution.md @@ -4,7 +4,7 @@
    method plot_distribution(columns=0, distribution=None, show=None,
                              title=None, figsize=None, filename=None, display=True, **kwargs)
    -
    + Plot column distributions. Additionally, it is possible to plot any of `scipy.stats` probability distributions fitted to the column. Missing values are ignored. diff --git a/docs_sources/api/plots/plot_errors.md b/docs_sources/api/plots/plot_errors.md index 768120220..c2223d1c4 100644 --- a/docs_sources/api/plots/plot_errors.md +++ b/docs_sources/api/plots/plot_errors.md @@ -2,7 +2,7 @@ -------------
    method plot_errors(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    + Plot a model's prediction errors, i.e. the actual targets from a set against the predicted values generated by the regressor. A linear fit is made on the data. The gray, intersected line shows the identity line. This pot can be useful to detect diff --git a/docs_sources/api/plots/plot_evals.md b/docs_sources/api/plots/plot_evals.md index 341dcd76a..bba21d3a7 100644 --- a/docs_sources/api/plots/plot_evals.md +++ b/docs_sources/api/plots/plot_evals.md @@ -2,7 +2,7 @@ ------------
    method plot_evals(models=None, dataset="both", title=None, figsize=(10, 6), filename=None, display=True)
    -
    + Plot evaluation curves for the train and test set. Only for models that allow in-training evaluation ([XGB](../../models/xgb), [LGB](../../models/lgb), [CatB](../../models/catb)). The metric is provided by the estimator's diff --git a/docs_sources/api/plots/plot_feature_importance.md b/docs_sources/api/plots/plot_feature_importance.md index 9c0c6f662..1cfb945ae 100644 --- a/docs_sources/api/plots/plot_feature_importance.md +++ b/docs_sources/api/plots/plot_feature_importance.md @@ -2,10 +2,11 @@ -------------------------
    method plot_feature_importance(models=None, show=None, title=None, figsize=None, filename=None, display=True)
    -
    -Plot a tree-based model's feature importance. The importances are normalized in order -to be able to compare them between models. The `feature_importance` attribute is -updated with the extracted importance ranking. + +Plot a tree-based model's feature importance. The importances are +normalized in order to be able to compare them between models. The +`feature_importance` attribute is updated with the extracted importance +ranking.
    diff --git a/docs_sources/api/plots/plot_gains.md b/docs_sources/api/plots/plot_gains.md index 7545d722f..d89ce9e12 100644 --- a/docs_sources/api/plots/plot_gains.md +++ b/docs_sources/api/plots/plot_gains.md @@ -2,7 +2,7 @@ ------------
    method plot_gains(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    + Plot the cumulative gains curve. Only for binary classification tasks.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_learning_curve.md b/docs_sources/api/plots/plot_learning_curve.md index 3b7349efb..1ee16cae2 100644 --- a/docs_sources/api/plots/plot_learning_curve.md +++ b/docs_sources/api/plots/plot_learning_curve.md @@ -2,7 +2,7 @@ ---------------------
    method plot_learning_curve(models=None, metric=0, title=None, figsize=(10, 6), filename=None, display=True)
    -
    + Plot the model's learning curve: score vs number of training samples. Only available if the models were fitted using [train sizing](../../../user_guide/#train-sizing).
    diff --git a/docs_sources/api/plots/plot_lift.md b/docs_sources/api/plots/plot_lift.md index acd8b0ff5..00ef184d1 100644 --- a/docs_sources/api/plots/plot_lift.md +++ b/docs_sources/api/plots/plot_lift.md @@ -2,7 +2,7 @@ ------------
    method plot_lift(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    + Plot the lift curve. Only for binary classification.
    diff --git a/docs_sources/api/plots/plot_partial_dependence.md b/docs_sources/api/plots/plot_partial_dependence.md index afdac4006..10269dda5 100644 --- a/docs_sources/api/plots/plot_partial_dependence.md +++ b/docs_sources/api/plots/plot_partial_dependence.md @@ -3,13 +3,14 @@
    method plot_partial_dependence(models=None, features=None, target=None,
                                    title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the partial dependence of features. The partial dependence of a feature (or a - set of features) corresponds to the average response of the model for each possible - value of the feature. Two-way partial dependence plots are plotted as contour plots - (only allowed for single model plots). The deciles of the feature values will be - shown with tick marks on the x-axes for one-way plots, and on both axes for two-way - plots. + +Plot the partial dependence of features. The partial dependence of a +feature (or a set of features) corresponds to the average response of +the model for each possible value of the feature. Two-way partial +dependence plots are plotted as contour plots (only allowed for single +model plots). The deciles of the feature values will be shown with tick +marks on the x-axes for one-way plots, and on both axes for two-way +plots.
    @@ -26,8 +27,8 @@ attribute is defined else it uses the first 3 features in the dataset. target: int or str, optional (default=1)
    -Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
    title: str or None, optional (default=None)
    diff --git a/docs_sources/api/plots/plot_pca.md b/docs_sources/api/plots/plot_pca.md index 57b86d727..92846e90b 100644 --- a/docs_sources/api/plots/plot_pca.md +++ b/docs_sources/api/plots/plot_pca.md @@ -2,9 +2,9 @@ ----------
    method plot_pca(title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the explained variance ratio vs the number of components. Only available if PCA - was applied on the data. + +Plot the explained variance ratio vs the number of components. Only +available if PCA was applied on the data.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_permutation_importance.md b/docs_sources/api/plots/plot_permutation_importance.md index aedb772a0..6120ba540 100644 --- a/docs_sources/api/plots/plot_permutation_importance.md +++ b/docs_sources/api/plots/plot_permutation_importance.md @@ -3,12 +3,13 @@
    method plot_permutation_importance(models=None, show=None, n_repeats=10,
                                        title=None, figsize=None, filename=None, display=True)
    -
    -Plot the feature permutation importance of models. Calculating all permutations can - be time-consuming, especially if `n_repeats` is high. They are stored under - the attribute `permutations`. This means that if a plot is repeated for - the same model with the same `n_repeats`, it will be considerably faster. - The `feature_importance` attribute is updated with the extracted importance ranking. + +Plot the feature permutation importance of models. Calculating all +permutations can be time-consuming, especially if `n_repeats` is high. +They are stored under the attribute `permutations`. This means that if +a plot is repeated for the same model with the same `n_repeats`, it +will be considerably faster. The `feature_importance` attribute is +updated with the extracted importance ranking.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_pipeline.md b/docs_sources/api/plots/plot_pipeline.md index 67b3b16a2..3c9b696a0 100644 --- a/docs_sources/api/plots/plot_pipeline.md +++ b/docs_sources/api/plots/plot_pipeline.md @@ -2,7 +2,7 @@ ---------------
    method plot_pipeline(show_params=True, branch=None, title=None, figsize=None, filename=None, display=True)
    -
    + Plot a diagram of every estimator in a branch.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_prc.md b/docs_sources/api/plots/plot_prc.md index 8203b5090..eac4228a3 100644 --- a/docs_sources/api/plots/plot_prc.md +++ b/docs_sources/api/plots/plot_prc.md @@ -2,9 +2,9 @@ -----------
    method plot_prc(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the precision-recall curve. The legend shows the average precision (AP) score. -Only for binary classification tasks. + +Plot the precision-recall curve. The legend shows the average +precision (AP) score. Only for binary classification tasks.
    diff --git a/docs_sources/api/plots/plot_probabilities.md b/docs_sources/api/plots/plot_probabilities.md index 8c7b28e0d..77413dcde 100644 --- a/docs_sources/api/plots/plot_probabilities.md +++ b/docs_sources/api/plots/plot_probabilities.md @@ -3,8 +3,9 @@
    method plot_probabilities(models=None, dataset="test", target=1,
                               title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the probability distribution of the classes in the target column. Only for classification tasks. + +Plot the probability distribution of the classes in the target column. +Only for classification tasks.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_residuals.md b/docs_sources/api/plots/plot_residuals.md index 6981263f7..bb7e73d9a 100644 --- a/docs_sources/api/plots/plot_residuals.md +++ b/docs_sources/api/plots/plot_residuals.md @@ -2,14 +2,14 @@ ----------------
    method plot_residuals(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    + The plot shows the residuals (difference between the predicted and the true value) on the vertical axis and the independent variable on the -horizontal axis. The gray, intersected line shows the identity line. This -plot can be useful to analyze the variance of the error of the regressor. -If the points are randomly dispersed around the horizontal axis, a linear -regression model is appropriate for the data; otherwise, a non-linear model -is more appropriate. Only for regression tasks. +horizontal axis. The gray, intersected line shows the identity line. +This plot can be useful to analyze the variance of the error of the +regressor. If the points are randomly dispersed around the horizontal +axis, a linear regression model is appropriate for the data; otherwise, +a non-linear model is more appropriate. Only for regression tasks.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_results.md b/docs_sources/api/plots/plot_results.md index 2e20aa78d..87f2f9b43 100644 --- a/docs_sources/api/plots/plot_results.md +++ b/docs_sources/api/plots/plot_results.md @@ -2,13 +2,12 @@ --------------
    method plot_results(models=None, metric=0, title=None, figsize=None, filename=None, display=True)
    -
    -Plot of the model results after the evaluation. -If all models applied bagging, the plot is a boxplot. -If not, the plot is a barplot. Models are ordered based -on their score from the top down. The score is either the -`mean_bagging` or `metric_test` attribute of the model, -selected in that order. + +Plot of the model results after the evaluation. If all models applied +bagging, the plot is a boxplot. If not, the plot is a barplot. Models +are ordered based on their score from the top down. The score is either +the `mean_bagging` or `metric_test` attribute of the model, selected in +that order.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_rfecv.md b/docs_sources/api/plots/plot_rfecv.md index 719b81e81..eb4ba8f16 100644 --- a/docs_sources/api/plots/plot_rfecv.md +++ b/docs_sources/api/plots/plot_rfecv.md @@ -2,9 +2,10 @@ ------------
    method plot_rfecv(title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the RFECV results, i.e. the scores obtained by the estimator fitted on every -subset of the dataset. Only available if RFECV was applied on the data. + +Plot the RFECV results, i.e. the scores obtained by the estimator +fitted on every subset of the dataset. Only available if RFECV was +applied on the data.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_roc.md b/docs_sources/api/plots/plot_roc.md index eea817d95..4039643cb 100644 --- a/docs_sources/api/plots/plot_roc.md +++ b/docs_sources/api/plots/plot_roc.md @@ -3,9 +3,9 @@
    method plot_roc(models=None, dataset="test", title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot the Receiver Operating Characteristics curve. The legend shows the Area Under -the ROC Curve (AUC) score. Only for binary classification tasks. + +Plot the Receiver Operating Characteristics curve. The legend shows the +Area Under the ROC Curve (AUC) score. Only for binary classification tasks.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_scatter_matrix.md b/docs_sources/api/plots/plot_scatter_matrix.md index 95efeb703..b59cf4f87 100644 --- a/docs_sources/api/plots/plot_scatter_matrix.md +++ b/docs_sources/api/plots/plot_scatter_matrix.md @@ -3,7 +3,7 @@
    method plot_scatter_matrix(columns=None, title=None, figsize=(10, 10), filename=None, display=True, **kwargs)
    -
    + Plot a matrix of scatter plots. A subset of max 250 random samples are selected from every column to not clutter the plot.
    Parameters:
    diff --git a/docs_sources/api/plots/plot_successive_halving.md b/docs_sources/api/plots/plot_successive_halving.md index 537b9f1d4..929a2496b 100644 --- a/docs_sources/api/plots/plot_successive_halving.md +++ b/docs_sources/api/plots/plot_successive_halving.md @@ -3,7 +3,7 @@
    method plot_successive_halving(models=None, metric=0, title=None,
                                    figsize=(10, 6), filename=None, display=True)
    -
    + Plot of the models' scores per iteration of the successive halving. Only available if the models were fitted using [successive halving](../../../user_guide/#successive-halving).
    diff --git a/docs_sources/api/plots/plot_threshold.md b/docs_sources/api/plots/plot_threshold.md index 1d45ef6ea..e157ce284 100644 --- a/docs_sources/api/plots/plot_threshold.md +++ b/docs_sources/api/plots/plot_threshold.md @@ -3,8 +3,9 @@
    method plot_threshold(models=None, metric=None, dataset="test", steps=100,
                           title=None, figsize=(10, 6), filename=None, display=True)
    -
    -Plot metric performances against threshold values. Only for binary classification tasks. + +Plot metric performances against threshold values. Only for binary +classification tasks.
    diff --git a/docs_sources/api/plots/scatter_plot.md b/docs_sources/api/plots/scatter_plot.md index eae1d792a..c324a2116 100644 --- a/docs_sources/api/plots/scatter_plot.md +++ b/docs_sources/api/plots/scatter_plot.md @@ -3,27 +3,28 @@
    method scatter_plot(models=None, index=None, feature=0, target=1,
                         title=None, figsize=(10, 6), filename=None, display=True, **kwargs)
    -
    -Plot SHAP's scatter plot. Plots the value of the feature on the x-axis and -the SHAP value of the same feature on the y-axis. This shows how the model -depends on the given feature, and is like a richer extension of the classical -partial dependence plots. Vertical dispersion of the data points represents -interaction effects. Read more about SHAP plots in the [user guide](../../../user_guide/#shap). + +Plot SHAP's scatter plot. Plots the value of the feature on the x-axis +and the SHAP value of the same feature on the y-axis. This shows how +the model depends on the given feature, and is like a richer extension +of the classical partial dependence plots. Vertical dispersion of the +data points represents interaction effects. Read more about SHAP plots +in the [user guide](../../../user_guide/#shap).
    Parameters:
    Parameters: models: str, sequence or None, optional (default=None)
    -Name of the models to plot. If None, all models in the pipeline are selected. Note - that selecting multiple models will raise an exception. To avoid this, call the - plot from a model. +Name of the models to plot. If None, all models in the pipeline are +selected. Note that selecting multiple models will raise an exception. +To avoid this, call the plot from a model.
    index: tuple, slice or None, optional (default=None)
    -Indices of the rows in the dataset to plot. If tuple (n, m), it selects rows -n until m. If None, it selects all rows in the test set. The scatter plot does -not support plotting a single sample. +Indices of the rows in the dataset to plot. If tuple (n, m), it selects +rows n until m. If None, it selects all rows in the test set. The scatter +plot does not support plotting a single sample.
    feature: int or str, optional (default=0)
    @@ -31,8 +32,8 @@ Index or name of the feature to plot.
    target: int or str, optional (default=1)
    -Index or name of the class in the target column to look at. Only for multi-class - classification tasks. +Index or name of the class in the target column to look at. Only for +multi-class classification tasks.
    title: str or None, optional (default=None)
    diff --git a/docs_sources/api/plots/waterfall_plot.md b/docs_sources/api/plots/waterfall_plot.md index e96290960..3b0953ddd 100644 --- a/docs_sources/api/plots/waterfall_plot.md +++ b/docs_sources/api/plots/waterfall_plot.md @@ -3,7 +3,7 @@
    method waterfall_plot(models=None, index=None, show=None, target=1,
                           title=None, figsize=None, filename=None, display=True)
    -
    + Plot SHAP's waterfall plot for a single prediction. The SHAP value of a feature represents the impact of the evidence provided by that feature on the model’s output. The waterfall plot diff --git a/docs_sources/api/predicting/decision_function.md b/docs_sources/api/predicting/decision_function.md index 6e1c27817..661466df7 100644 --- a/docs_sources/api/predicting/decision_function.md +++ b/docs_sources/api/predicting/decision_function.md @@ -2,7 +2,7 @@ -------------------
    method decision_function(X, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch and return predicted confidence scores. If called from a trainer, it will use the best model in the pipeline (under the `winner` attribute). If called diff --git a/docs_sources/api/predicting/predict.md b/docs_sources/api/predicting/predict.md index 343f21c04..ac02701bf 100644 --- a/docs_sources/api/predicting/predict.md +++ b/docs_sources/api/predicting/predict.md @@ -2,7 +2,7 @@ ---------
    method predict(X, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch and return class predictions. If called from a trainer, it will use the best model in the pipeline (under the `winner` attribute). If called from a diff --git a/docs_sources/api/predicting/predict_log_proba.md b/docs_sources/api/predicting/predict_log_proba.md index 5c1a5af2f..c88f99a41 100644 --- a/docs_sources/api/predicting/predict_log_proba.md +++ b/docs_sources/api/predicting/predict_log_proba.md @@ -2,7 +2,7 @@ -------------------
    method predict_log_proba(X, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch and return class log-probabilities. If called from a trainer, it will use the best model in the pipeline (under the `winner` attribute). If called diff --git a/docs_sources/api/predicting/predict_proba.md b/docs_sources/api/predicting/predict_proba.md index 5ea140225..da4dee0d6 100644 --- a/docs_sources/api/predicting/predict_proba.md +++ b/docs_sources/api/predicting/predict_proba.md @@ -2,7 +2,7 @@ ---------------
    method predict_proba(X, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch and return class probabilities. If called from a trainer, it will use the best model in the pipeline (under the `winner` attribute). If called from diff --git a/docs_sources/api/predicting/score.md b/docs_sources/api/predicting/score.md index 67c7d568e..afe52eefa 100644 --- a/docs_sources/api/predicting/score.md +++ b/docs_sources/api/predicting/score.md @@ -2,7 +2,7 @@ -------
    method score(X, y, sample_weights=None, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch and return the model's score. If called from a trainer, it will use the best model in the pipeline (under the `winner` attribute). If called diff --git a/docs_sources/api/predicting/transform.md b/docs_sources/api/predicting/transform.md index deb90e548..c7a1d3644 100644 --- a/docs_sources/api/predicting/transform.md +++ b/docs_sources/api/predicting/transform.md @@ -2,7 +2,7 @@ -----------
    method transform(X, y=None, pipeline=None, verbose=None) 
    -
    + Transform new data through all transformers in a branch. By default, transformers that are applied on the training set only are not used during the transformations. Use the `pipeline` parameter to customize diff --git a/docs_sources/api/training/directclassifier.md b/docs_sources/api/training/directclassifier.md index 16bc871e6..294505c4a 100644 --- a/docs_sources/api/training/directclassifier.md +++ b/docs_sources/api/training/directclassifier.md @@ -426,7 +426,7 @@ Fontsize for the ticks along the plot's axes.
    method calibrate(**kwargs)
    -
    + Applies probability calibration on the winning model. The calibration is performed using sklearn's [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) class. The model is trained via cross-validation on a subset of the training data, @@ -450,7 +450,7 @@ this only if you have another, independent set for testing.
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -490,7 +490,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -509,7 +509,7 @@ Name of the models to clear from the pipeline. If None, clear all models.
    method get_class_weight(dataset="train")
    -
    + Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -562,7 +562,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout. @@ -590,7 +590,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -621,7 +621,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -646,7 +646,7 @@ Whether to save the data as an attribute of the instance. If False, remember to
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -704,7 +704,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -741,7 +741,7 @@ When False, only the predictions of estimators are used
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.
    diff --git a/docs_sources/api/training/directregressor.md b/docs_sources/api/training/directregressor.md index 5286602ce..f7a11d791 100644 --- a/docs_sources/api/training/directregressor.md +++ b/docs_sources/api/training/directregressor.md @@ -412,7 +412,7 @@ Fontsize for the ticks along the plot's axes.
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -452,7 +452,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -496,7 +496,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
    @@ -524,7 +524,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -555,7 +555,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -580,7 +580,7 @@ Whether to save the data as an attribute of the instance. If False, remember to
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -626,7 +626,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -663,7 +663,7 @@ original training data.
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.
    diff --git a/docs_sources/api/training/successivehalvingclassifier.md b/docs_sources/api/training/successivehalvingclassifier.md index 4d8acc453..1204c08cb 100644 --- a/docs_sources/api/training/successivehalvingclassifier.md +++ b/docs_sources/api/training/successivehalvingclassifier.md @@ -431,7 +431,7 @@ Fontsize for the ticks along the plot's axes.
    method calibrate(**kwargs)
    -
    + Applies probability calibration on the winning model. The calibration is performed using sklearn's [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) class. The model is trained via cross-validation on a subset of the training data, @@ -455,7 +455,7 @@ this only if you have another, independent set for testing.
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -495,7 +495,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -514,7 +514,7 @@ Name of the models to clear from the pipeline. If None, clear all models.
    method get_class_weight(dataset="train")
    -
    + Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -567,7 +567,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
    @@ -595,7 +595,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -626,7 +626,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -650,7 +650,7 @@ add the data to ATOMLoader when loading the
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -707,7 +707,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -744,7 +744,7 @@ original training data.
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.
    diff --git a/docs_sources/api/training/successivehalvingregressor.md b/docs_sources/api/training/successivehalvingregressor.md index 6e4cfc1af..bf905f9aa 100644 --- a/docs_sources/api/training/successivehalvingregressor.md +++ b/docs_sources/api/training/successivehalvingregressor.md @@ -418,7 +418,7 @@ Fontsize for the ticks along the plot's axes.
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -458,7 +458,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -502,7 +502,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
    @@ -530,7 +530,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -561,7 +561,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -585,7 +585,7 @@ add the data to ATOMLoader when loading the
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -630,7 +630,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -667,7 +667,7 @@ When False, only the predictions of estimators are used
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.
    diff --git a/docs_sources/api/training/trainsizingclassifier.md b/docs_sources/api/training/trainsizingclassifier.md index 95cc5ea65..1037faea5 100644 --- a/docs_sources/api/training/trainsizingclassifier.md +++ b/docs_sources/api/training/trainsizingclassifier.md @@ -434,7 +434,7 @@ Fontsize for the ticks along the plot's axes.
    method calibrate(**kwargs)
    -
    + Applies probability calibration on the winning model. The calibration is performed using sklearn's [CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html) class. The model is trained via cross-validation on a subset of the training data, @@ -458,7 +458,7 @@ Additional keyword arguments for the CalibratedClassifierCV instance. Using
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -498,7 +498,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -517,7 +517,7 @@ Name of the models to clear from the pipeline. If None, clear all models.
    method get_class_weight(dataset="train")
    -
    + Return class weights for a balanced data set. Statistically, the class weights re-balance the data set so that the sampled data set represents the target population as closely as reasonably possible. The returned weights are inversely @@ -570,7 +570,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
    @@ -598,7 +598,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -629,7 +629,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -653,7 +653,7 @@ add the data to ATOMLoader when loading the
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -710,7 +710,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -747,7 +747,7 @@ original training data.
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.
    diff --git a/docs_sources/api/training/trainsizingregressor.md b/docs_sources/api/training/trainsizingregressor.md index 77ade49fb..7eae22b3b 100644 --- a/docs_sources/api/training/trainsizingregressor.md +++ b/docs_sources/api/training/trainsizingregressor.md @@ -422,7 +422,7 @@ Fontsize for the ticks along the plot's axes.
    method canvas(nrows=1, ncols=2, title=None, figsize=None, filename=None, display=True)
    -
    + This `@contextmanager` allows you to draw many plots in one figure. The default option is to add two plots side by side. See the [user guide](../../../user_guide/#canvas) for an example use case. @@ -462,7 +462,7 @@ Whether to render the plot.
    method delete(models=None)
    -
    + Removes a model from the pipeline. If all models in the pipeline are removed, the metric is reset. Use this method to remove unwanted models or to free some memory before saving the instance. @@ -506,7 +506,7 @@ Dictionary of the parameter names mapped to their values.
    method log(msg, level=0)
    -
    + Write a message to the logger and print it to stdout.
    @@ -534,7 +534,7 @@ Reset the [plot aesthetics](../../../user_guide/#aesthetics) to their default va
    method reset_predictions()
    -
    + Clear the [prediction attributes](../../../user_guide/#predicting) from all models. Use this method to free some memory before saving the trainer.


    @@ -565,7 +565,7 @@ Training set and test set. Allowed input formats are:
    method save(filename=None, save_data=True)
    -
    + Save the instance to a pickle file. Remember that the class contains the complete dataset as attribute, so the file can become large for big datasets! To avoid this, use `save_data=False`. @@ -589,7 +589,7 @@ add the data to ATOMLoader when loading the
    method scoring(metric=None, dataset="test", **kwargs)
    -
    + Print all the models' scoring for a specific metric.
    @@ -634,7 +634,7 @@ Estimator instance.
    method stacking(models=None, estimator=None, stack_method="auto", passthrough=False)
    -
    + Add a Stacking instance to the models in the pipeline.
    @@ -671,7 +671,7 @@ original training data.
    method voting(models=None, weights=None)
    -
    + Add a Voting instance to the models in the pipeline.