[fix] Address the Ravin's comments

automl · Nov 11, 2021 · c8dcf23 · c8dcf23
1 parent 7324841
commit c8dcf23
Showing 1 changed file with 10 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -19,15 +19,20 @@ In the figure, **Data** is provided by user and
 **Portfolio** is a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either
 regression or classification task on the provided dataset.
 Then API starts the following procedures:
-1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled
-2. **Create dataset**: Create a dataset that can be handled in this API including cross validation splits
-3. **Evaluate baselines**: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance
+1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
+2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
+3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
 4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
     a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
-    b. Sample a target algorithm and the corresponding hyperparameter configuration by SMAC\
+    b. Sample a pipeline hyperparameter configuration *2 by SMAC\
     c. Update the observations by obtained results\
     d. Repeat a. -- c. until the budget runs out
-5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf)
+5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
+
+*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
+
+*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and 
+(which specifies the choice of components in each step and their corresponding hyperparameters.
 
 ## Installation