Skip to content

My_AutoML v0.0.1 Basic workable AutoML

Compare
Choose a tag to compare
@PanyiDong PanyiDong released this 07 Apr 21:48
· 395 commits to master since this release

AutoML pipeline

The pipeline is targeted as a AutoML for tabular regression/classification tasks.

Basic workable pipeline insists of a pipeline: encoding, imputation, balancing, scaling, feature selection, regression/classification models.

The pipeline can achieve automated Model Selection and Hyperparameter Optimization by HyperOpt.

Current methods in pipeline (some methods are deprecated and not displayed below):

1. Encoding

1 DataEncoding

2. Imputation

SimpleImputer, JointImputer, ExpectationMaximization, KNNImputer, KNNImputer, MissForestImputer, MICE, GAIN

3. Balancing

SimpleRandomOverSampling SimpleRandomUnderSampling TomekLink EditedNearestNeighbor CondensedNearestNeighbor OneSidedSelection CNN_TomekLink Smote Smote_TomekLink Smote_ENN

4. Scaling

MinMaxScale Standardize Normalize RobustScale PowerTransformer QuantileTransformer Winsorization

5. Feature Selection

RBFSampler FeatureFilter ASFFS GeneticAlgorithm extra_trees_preproc_for_classification/ extra_trees_preproc_for_regression liblinear_svc_preprocessor polynomial select_percentile_classification/ select_percentile_regression select_rates_classification/ select_rates_regression truncatedSVD

6. Regression

AdaboostRegressor ARDRegression DecisionTree ExtraTreesRegressor GaussianProcess GradientBoosting KNearestNeighborsRegressor LibLinear_SVR LibSVM_SVR MLPRegressor RandomForest SGD

7. Classification

AdaboostClassifier BernoulliNB DecisionTree ExtraTreesClassifier GaussianNB GradientBoostingClassifier NearestNeighborsClassifier LDA LibLinear_SVC LibSVM_SVC MLPClassifier MultinomialNB PassiveAggressive QDA RandomForest SGD

Working

1. Use ray to tune the pipeline (workable, still work on outputs)

2. Use MLP for Tabular classification/regression, RNN structure for text processing (partial code ready)