Skip to content

Commit

Permalink
[fix] Address the Ravin's comments
Browse files Browse the repository at this point in the history
  • Loading branch information
nabenabe0928 committed Nov 11, 2021
1 parent 7324841 commit c8dcf23
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,20 @@ In the figure, **Data** is provided by user and
**Portfolio** is a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either
regression or classification task on the provided dataset.
Then API starts the following procedures:
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled
2. **Create dataset**: Create a dataset that can be handled in this API including cross validation splits
3. **Evaluate baselines**: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
b. Sample a target algorithm and the corresponding hyperparameter configuration by SMAC\
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
c. Update the observations by obtained results\
d. Repeat a. -- c. until the budget runs out
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf)
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).

*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset

*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
(which specifies the choice of components in each step and their corresponding hyperparameters.

## Installation

Expand Down

0 comments on commit c8dcf23

Please sign in to comment.