[doc] Add workflow of the AutoPytorch

automl · Nov 22, 2021 · 4ca3676 · 4ca3676
1 parent 1e06cce
commit 4ca3676
Show file tree

Hide file tree

Showing 2 changed files with 29 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -9,6 +9,33 @@ We plan to enable image data and time-series data.
 
 Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
 
+## Workflow
+
+The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
+
+<img src="figs/apt_workflow.png" width="500">
+
+In the figure, **Data** is provided by user and
+**Portfolio** is a set of configurations of neural networks on diverse
+datasets.
+The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
+This portfolio is used to warm-start the optimization of SMAC.
+In other words, we evaluate the portfolio on a provided data as initial configurations.
+Then API starts the following procedures:
+1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
+2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
+3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
+4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
+    a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
+    b. Sample a pipeline hyperparameter configuration *2 by SMAC\
+    c. Update the observations by obtained results\
+    d. Repeat a. -- c. until the budget runs out
+5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
+
+*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
+
+*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and 
+(which specifies the choice of components in each step and their corresponding hyperparameters.
 
 ## Installation
 
@@ -25,8 +52,8 @@ We recommend using Anaconda for developing as follows:
 git submodule update --init --recursive
 
 # Create the environment
-conda create -n autopytorch python=3.8
-conda activate autopytorch
+conda create -n auto-pytorch python=3.8
+conda activate auto-pytorch
 conda install swig
 cat requirements.txt | xargs -n 1 -L 1 pip install
 python setup.py install

diff --git a/figs/apt_workflow.png b/figs/apt_workflow.png