diff --git a/README.md b/README.md
index 7a761cf7282..11b71a999b2 100644
--- a/README.md
+++ b/README.md
@@ -3,43 +3,61 @@
-[![Build Status](https://travis-ci.org/kubeflow/katib.svg?branch=master)](https://travis-ci.org/kubeflow/katib)
+[![Build Status](https://travis-ci.com/kubeflow/katib.svg?branch=master)](https://travis-ci.com/kubeflow/katib)
[![Coverage Status](https://coveralls.io/repos/github/kubeflow/katib/badge.svg?branch=master)](https://coveralls.io/github/kubeflow/katib?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/katib)](https://goreportcard.com/report/github.com/kubeflow/katib)
-
-Katib is a Kubernetes-based system for [Hyperparameter Tuning][1] and [Neural Architecture Search][2]. Katib supports a number of ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.
-
-Table of Contents
-=================
-
- * [Getting Started](#getting-started)
- * [Name](#name)
- * [Concepts in Katib](#concepts-in-katib)
- * [Experiment](#experiment)
- * [Suggestion](#suggestion)
- * [Trial](#trial)
- * [Worker Job](#worker-job)
- * [Components in Katib](#components-in-katib)
- * [Web UI](#web-ui)
- * [API documentation](#api-documentation)
- * [Installation](#installation)
- * [TF operator](#tf-operator)
- * [PyTorch operator](#pytorch-operator)
- * [Katib](#katib)
- * [Running examples](#running-examples)
- * [Cleanups](#cleanups)
- * [Katib SDK](#katib-sdk)
- * [Quick Start](#quick-start)
- * [Who are using Katib?](#who-are-using-katib)
- * [Citation](#citation)
- * [CONTRIBUTING](#contributing)
-
-Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)
+[![Releases](https://img.shields.io/github/release-pre/kubeflow/katib.svg?sort=semver)](https://github.com/kubeflow/katib/releases)
+[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://kubeflow.slack.com/archives/C018PMV53NW)
+
+Katib is a Kubernetes-native project for automated machine learning (AutoML).
+Katib supports
+[Hyperparameter Tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization),
+[Early Stopping](https://en.wikipedia.org/wiki/Early_stopping) and
+[Neural Architecture Search](https://en.wikipedia.org/wiki/Neural_architecture_search)
+
+Katib is the project which is agnostic to machine learning (ML) frameworks.
+It can tune hyperparameters of applications written in any language of the
+users’ choice and natively supports many ML frameworks, such as TensorFlow,
+MXNet, PyTorch, XGBoost, and others.
+
+
+
+
+# Table of Contents
+
+- [Getting Started](#getting-started)
+- [Name](#name)
+- [Concepts in Katib](#concepts-in-katib)
+ - [Experiment](#experiment)
+ - [Suggestion](#suggestion)
+ - [Trial](#trial)
+ - [Worker Job](#worker-job)
+ - [Search Algorithms](#search-algorithms)
+ - [Hyperparameter Tuning](#hyperparameter-tuning)
+ - [Neural Architecture Search](#neural-architecture-search)
+- [Components in Katib](#components-in-katib)
+- [Web UI](#web-ui)
+- [GRPC API documentation](#grpc-api-documentation)
+- [Installation](#installation)
+ - [TF operator](#tf-operator)
+ - [PyTorch operator](#pytorch-operator)
+ - [Katib](#katib)
+ - [Running examples](#running-examples)
+ - [Katib SDK](#katib-sdk)
+ - [Cleanups](#cleanups)
+- [Quick Start](#quick-start)
+- [Who are using Katib?](#who-are-using-katib)
+- [CONTRIBUTING](#contributing)
+- [Citation](#citation)
+
+
+
+Created by [doctoc](https://github.com/thlorenz/doctoc).
## Getting Started
-See the [getting-started
-guide](https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/)
+Follow the
+[getting-started guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/)
on the Kubeflow website.
## Name
@@ -48,101 +66,132 @@ Katib stands for `secretary` in Arabic.
## Concepts in Katib
-For a detailed description of the concepts in Katib, hyperparameter tuning, and
-neural architecture search, see the [Kubeflow
-documentation](https://www.kubeflow.org/docs/components/hyperparameter-tuning/overview/).
+For a detailed description of the concepts in Katib and AutoML, check the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/overview/).
-Katib has the concepts of Experiment, Trial, Job and Suggestion.
+Katib has the concepts of `Experiment`, `Suggestion`, `Trial` and `Worker Job`.
### Experiment
-`Experiment` represents a single optimization run over a feasible space.
+An `Experiment` represents a single optimization run over a feasible space.
Each `Experiment` contains a configuration:
-1. Objective: What we are trying to optimize.
-2. Search Space: Constraints for configurations describing the feasible space.
-3. Search Algorithm: How to find the optimal configurations.
+1. **Objective**: What you want to optimize.
+2. **Search Space**: Constraints for configurations describing the feasible space.
+3. **Search Algorithm**: How to find the optimal configurations.
-`Experiment` is defined as a CRD. See the detailed guide to [configuring and running a Katib
-experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/)
+Katib `Experiment` is defined as a CRD. Check the detailed guide to
+[configuring and running a Katib `Experiment`](https://kubeflow.org/docs/components/katib/experiment/)
in the Kubeflow docs.
### Suggestion
-A Suggestion is a proposed solution to the optimization problem which is one set of hyperparameter values or a list of parameter assignments. Then a `Trial` will be created to evaluate the parameter assignments.
+A `Suggestion` is a set of hyperparameter values that the hyperparameter tuning
+process has proposed. Katib creates a `Trial` to evaluate
+the suggested set of values.
-`Suggestion` is defined as a CRD.
+Katib `Suggestion` is defined as a CRD.
### Trial
-A `Trial` is one iteration of the optimization process, which is one `worker job` instance with a list of parameter assignments(corresponding to a suggestion).
+A `Trial` is one iteration of the hyperparameter tuning process.
+A `Trial` corresponds to one worker job instance with a list of parameter
+assignments. The list of parameter assignments corresponds to a `Suggestion`.
+
+Each `Experiment` runs several `Trials`. The `Experiment` runs the `Trials` until
+it reaches either the objective or the configured maximum number of `Trials`.
+
+Katib `Trial` is defined as a CRD.
+
+### Worker Job
+
+The `Worker Job` is the process that runs to evaluate a `Trial` and calculate
+its objective value.
+
+The `Worker Job` can be any type of Kubernetes resource or
+[Kubernetes CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
+Follow the [`Trial` template guide](https://www.kubeflow.org/docs/components/katib/trial-template/#custom-resource)
+to support your own Kubernetes resource in Katib.
+
+Katib has these CRD examples in upstream:
+
+- [Kubernetes `Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/)
-`Trial` is defined as a CRD.
+- [Kubeflow `TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/)
-### Worker Job
+- [Kubeflow `PyTorchJob`](https://www.kubeflow.org/docs/components/training/pytorch/)
-A `Worker Job` refers to a process responsible for evaluating a `Trial` and calculating its objective value.
+- [Kubeflow `MPIJob`](https://www.kubeflow.org/docs/components/training/mpi/)
-The worker kind can be [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) which is a non distributed execution, [Kubeflow TFJob](https://www.kubeflow.org/docs/guides/components/tftraining/) or [Kubeflow PyTorchJob](https://www.kubeflow.org/docs/guides/components/pytorch/) which are distributed executions.
-Thus, Katib supports multiple frameworks with the help of different job kinds.
+- [Tekton `Pipeline`](https://github.com/tektoncd/pipeline)
-Currently Katib supports the following exploration algorithms:
+Thus, Katib supports multiple frameworks with the help of different job kinds.
+
+### Search Algorithms
+
+Katib currently supports several search algorithms. Follow the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/experiment/#search-algorithms-in-detail)
+to know more about each algorithm.
#### Hyperparameter Tuning
-* [Random Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Random_search)
-* [Tree of Parzen Estimators (TPE)](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)
-* [Grid Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search)
-* [Hyperband](https://arxiv.org/pdf/1603.06560.pdf)
-* [Bayesian Optimization](https://arxiv.org/pdf/1012.2599.pdf)
-* [CMA Evolution Strategy](https://arxiv.org/abs/1604.00772)
+- [Random Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Random_search)
+- [Tree of Parzen Estimators (TPE)](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)
+- [Grid Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search)
+- [Hyperband](https://arxiv.org/pdf/1603.06560.pdf)
+- [Bayesian Optimization](https://arxiv.org/pdf/1012.2599.pdf)
+- [Covariance Matrix Adaptation Evolution Strategy (CMA-ES)](https://arxiv.org/abs/1604.00772)
#### Neural Architecture Search
-* [Efficient Neural Architecture Search (ENAS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/enas)
-* [Differentiable Architecture Search (DARTS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/darts)
-
+- [Efficient Neural Architecture Search (ENAS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/enas)
+- [Differentiable Architecture Search (DARTS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/darts)
## Components in Katib
-Katib consists of several components as shown below. Each component is running on k8s as a deployment.
-Each component communicates with others via GRPC and the API is defined at `pkg/apis/manager/v1beta1/api.proto`
-for v1beta1 version and `pkg/apis/manager/v1alpha3/api.proto` for v1alpha3 version.
+Katib consists of several components as shown below. Each component is running
+on Kubernetes as a deployment. Each component communicates with others via GRPC
+and the API is defined at `pkg/apis/manager/v1beta1/api.proto`.
- Katib main components:
- - katib-db-manager: GRPC API server of Katib which is the DB Interface.
- - katib-mysql: Data storage backend of Katib using mysql.
- - katib-ui: User interface of Katib.
- - katib-controller: Controller for Katib CRDs in Kubernetes.
+ - `katib-db-manager` - the GRPC API server of Katib which is the DB Interface.
+ - `katib-mysql` - the data storage backend of Katib using mysql.
+ - `katib-ui` - the user interface of Katib.
+ - `katib-controller` - the controller for the Katib CRDs in Kubernetes.
## Web UI
Katib provides a Web UI.
-You can visualize general trend of Hyper parameter space and each training history. You can use
-[random-example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml) or
-[other examples](https://github.com/kubeflow/katib/blob/master/examples/v1beta1) to generate a similar UI.
+You can visualize general trend of Hyper parameter space and
+each training history. You can use
+[random-example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml)
+or
+[other examples](https://github.com/kubeflow/katib/blob/master/examples/v1beta1)
+to generate a similar UI. Follow the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-ui)
+to access the Katib UI.
![katibui](./docs/images/katib-ui.png)
## GRPC API documentation
-See the [Katib v1beta1 API reference docs](https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/gen-doc/api.md).
-
-See the [Katib v1alpha3 API reference docs](https://www.kubeflow.org/docs/reference/katib/).
+Check the [Katib v1beta1 API reference docs](https://www.kubeflow.org/docs/reference/katib/v1beta1/katib/).
## Installation
-For standard installation of Katib with support for all job operators,
-install Kubeflow. Current official Katib version in Kubeflow latest release is v1alpha3.
-See the documentation:
+For standard installation of Katib with support for all job operators,
+install Kubeflow.
+Follow the documentation:
-* [Kubeflow installation
-guide](https://www.kubeflow.org/docs/started/getting-started/)
-* [Kubeflow hyperparameter tuning
-guides](https://www.kubeflow.org/docs/components/hyperparameter-tuning/).
+- [Kubeflow installation guide](https://www.kubeflow.org/docs/started/getting-started/)
+- [Kubeflow Katib guides](https://www.kubeflow.org/docs/components/katib/).
-If you install Katib with other Kubeflow components, you can't submit Katib jobs in Kubeflow namespace.
+If you install Katib with other Kubeflow components,
+you can't submit Katib jobs in Kubeflow namespace. Check the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm)
+to know more about it.
-Alternatively, if you want to install Katib manually with TF and PyTorch operators support, follow these steps:
+Alternatively, if you want to install Katib manually with TF and PyTorch
+operators support, follow these steps:
Create Kubeflow namespace:
@@ -166,7 +215,7 @@ For installing TF operator, run the following:
cd "${MANIFESTS_DIR}/tf-training/tf-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/tf-training/tf-job-operator/base"
-kustomize build . | kubectl apply -n kubeflow -f -
+kustomize build . | kubectl apply -f -
```
### PyTorch operator
@@ -177,54 +226,18 @@ For installing PyTorch operator, run the following:
cd "${MANIFESTS_DIR}/pytorch-job/pytorch-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/pytorch-job/pytorch-operator/base/"
-kustomize build . | kubectl apply -n kubeflow -f -
+kustomize build . | kubectl apply -f -
```
### Katib
-Finally, you can install Katib.
-
-For v1beta1 version, run the following:
+Finally, you can install Katib:
```
git clone git@github.com:kubeflow/katib.git
-bash katib/scripts/v1beta1/deploy.sh
+make deploy
```
-For v1alpha3 version, run the following:
-
-```
-cd "${MANIFESTS_DIR}/katib/katib-crds/base"
-kustomize build . | kubectl apply -f -
-cd "${MANIFESTS_DIR}/katib/katib-controller/base"
-kustomize build . | kubectl apply -f -
-
-```
-
-If you install Katib from Kubeflow manifest repository and you want to use Katib in a cluster that doesn't have a StorageClass for dynamic volume provisioning, you have to create persistent volume manually to bound your persistent volume claim.
-
-This is sample yaml file for creating a persistent volume with local storage:
-
-```yaml
-apiVersion: v1
-kind: PersistentVolume
-metadata:
- name: katib-mysql
- labels:
- type: local
- app: katib
-spec:
- storageClassName: katib
- capacity:
- storage: 10Gi
- accessModes:
- - ReadWriteOnce
- hostPath:
- path: /tmp/katib
-```
-
-Create this PV after deploying Katib package
-
Check if all components are running successfully:
```
@@ -246,7 +259,6 @@ tf-job-operator-796b4747d8-4fh82 1/1 Running 0 21m
### Running examples
After deploy everything, you can run examples to verify the installation.
-Examples bellow are for v1beta1 version.
This is an example for TF operator:
@@ -260,161 +272,40 @@ This is an example for PyTorch operator:
kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/pytorchjob-example.yaml
```
-You can check status of experiment
-
-```yaml
-$ kubectl describe experiment tfjob-example -n kubeflow
-
-Name: tfjob-example
-Namespace: kubeflow
-Labels:
-Annotations:
-API Version: kubeflow.org/v1beta1
-Kind: Experiment
-Metadata:
- Creation Timestamp: 2020-07-15T14:27:53Z
- Finalizers:
- update-prometheus-metrics
- Generation: 1
- Resource Version: 100380029
- Self Link: /apis/kubeflow.org/v1beta1/namespaces/kubeflow/experiments/tfjob-example
- UID: 5e3cf1f5-c6a7-11ea-90dd-42010a9a0020
-Spec:
- Algorithm:
- Algorithm Name: random
- Max Failed Trial Count: 3
- Max Trial Count: 12
- Metrics Collector Spec:
- Collector:
- Kind: TensorFlowEvent
- Source:
- File System Path:
- Kind: Directory
- Path: /train
- Objective:
- Goal: 0.99
- Metric Strategies:
- Name: accuracy_1
- Value: max
- Objective Metric Name: accuracy_1
- Type: maximize
- Parallel Trial Count: 3
- Parameters:
- Feasible Space:
- Max: 0.05
- Min: 0.01
- Name: learning_rate
- Parameter Type: double
- Feasible Space:
- Max: 200
- Min: 100
- Name: batch_size
- Parameter Type: int
- Resume Policy: LongRunning
- Trial Template:
- Trial Parameters:
- Description: Learning rate for the training model
- Name: learningRate
- Reference: learning_rate
- Description: Batch Size
- Name: batchSize
- Reference: batch_size
- Trial Spec:
- API Version: kubeflow.org/v1
- Kind: TFJob
- Spec:
- Tf Replica Specs:
- Worker:
- Replicas: 2
- Restart Policy: OnFailure
- Template:
- Spec:
- Containers:
- Command:
- python
- /var/tf_mnist/mnist_with_summaries.py
- --log_dir=/train/metrics
- --learning_rate=${trialParameters.learningRate}
- --batch_size=${trialParameters.batchSize}
- Image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
- Image Pull Policy: Always
- Name: tensorflow
-Status:
- Completion Time: 2020-07-15T14:30:52Z
- Conditions:
- Last Transition Time: 2020-07-15T14:27:53Z
- Last Update Time: 2020-07-15T14:27:53Z
- Message: Experiment is created
- Reason: ExperimentCreated
- Status: True
- Type: Created
- Last Transition Time: 2020-07-15T14:30:52Z
- Last Update Time: 2020-07-15T14:30:52Z
- Message: Experiment is running
- Reason: ExperimentRunning
- Status: False
- Type: Running
- Last Transition Time: 2020-07-15T14:30:52Z
- Last Update Time: 2020-07-15T14:30:52Z
- Message: Experiment has succeeded because Objective goal has reached
- Reason: ExperimentGoalReached
- Status: True
- Type: Succeeded
- Current Optimal Trial:
- Best Trial Name: tfjob-example-gjxn54vl
- Observation:
- Metrics:
- Latest: 0.966300010681
- Max: 1.0
- Min: 0.103260867298
- Name: accuracy_1
- Parameter Assignments:
- Name: learning_rate
- Value: 0.015945204040626416
- Name: batch_size
- Value: 184
- Start Time: 2020-07-15T14:27:53Z
- Succeeded Trial List:
- tfjob-example-5jd8nnjg
- tfjob-example-bgjfpd5t
- tfjob-example-gjxn54vl
- tfjob-example-vpdqxkch
- tfjob-example-wvptx7gt
- Trials: 5
- Trials Succeeded: 5
-Events:
-```
+Check the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm)
+how to monitor your `Experiment` status.
-When the spec.Status.Condition becomes ```Succeeded```, the experiment is finished.
-
-You can monitor your results in Katib UI.
-Access Katib UI via Kubeflow dashboard if you have used standard installation or port-forward the `katib-ui` service if you have installed manually.
+You can view your results in Katib UI.
+If you used standard installation, access the Katib UI via Kubeflow dashboard.
+Otherwise, port-forward the `katib-ui`:
```
kubectl -n kubeflow port-forward svc/katib-ui 8080:80
```
-You can access the Katib UI using this URL: ```http://localhost:8080/katib/```.
+You can access the Katib UI using this URL: `http://localhost:8080/katib/`.
### Katib SDK
-Katib supports Python SDK for v1beta1 and v1alpha3 version.
-
-* See the [Katib v1beta1 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1).
+Katib supports Python SDK:
-* See the [Katib v1alpha3 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1alpha3).
+- Check the [Katib v1beta1 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1).
-Run [`gen-sdk.sh`](https://github.com/kubeflow/katib/blob/master/hack/gen-python-sdk/gen-sdk.sh) to update SDK.
+Run `make generate` to update Katib SDK.
### Cleanups
-To delete installed TF and PyTorch operator run `kubectl delete -f` on the respective folders.
+To delete installed TF and PyTorch operator run `kubectl delete -f`
+on the respective folders.
-To delete Katib for v1beta1 version run `bash katib/scripts/v1beta1/undeploy.sh`.
+To delete Katib run `make undeploy`.
## Quick Start
-Please see [Quick Start Guide](./docs/quick-start.md).
+Please follow the
+[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#examples)
+to submit your first Katib experiment.
## Who are using Katib?
@@ -422,18 +313,16 @@ Please see [ADOPTERS.md](ADOPTERS.md).
## CONTRIBUTING
-Please feel free to test the system! [developer-guide.md](./docs/developer-guide.md) is a good starting point for developers.
-
-[1]: https://en.wikipedia.org/wiki/Hyperparameter_optimization
-[2]: https://en.wikipedia.org/wiki/Neural_architecture_search
-[3]: https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/bcb15507f4b52991a0783013df4222240e942381.pdf
+Please feel free to test the system!
+[developer-guide.md](./docs/developer-guide.md) is a good starting point
+for developers.
## Citation
If you use Katib in a scientific publication, we would appreciate
citations to the following paper:
-[A Scalable and Cloud-Native Hyperparameter Tuning System](https://arxiv.org/abs/2006.02085), George *et al.*, arXiv:2006.02085, 2020.
+[A Scalable and Cloud-Native Hyperparameter Tuning System](https://arxiv.org/abs/2006.02085), George _et al._, arXiv:2006.02085, 2020.
Bibtex entry:
diff --git a/docs/algorithm-settings.md b/docs/algorithm-settings.md
deleted file mode 100644
index e2ab437247e..00000000000
--- a/docs/algorithm-settings.md
+++ /dev/null
@@ -1,40 +0,0 @@
-# Hyperparameter Tuning Algorithms
-
-Table of Contents
-=================
-
- * [Hyperparameter Tuning Algorithms](#hyperparameter-tuning-algorithms)
- * [Table of Contents](#table-of-contents)
- * [Grid Search](#grid-search)
- * [Chocolate](#chocolate)
- * [Random Search](#random-search)
- * [Hyperopt](#hyperopt)
- * [TPE](#tpe)
- * [Hyperopt](#hyperopt-1)
- * [Bayesian Optimization](#bayesian-optimization)
- * [scikit-optimize](#scikit-optimize)
- * [References](#references)
-
-Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)
-
-
-
-
-
-For information about the hyperparameter tuning algorithms and neural
-architecture search implemented or integrated in Katib, see the detailed guide
-to [configuring and running a Katib
-experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/).
-For information about supported algorithms in Katib, see the [Katib configuration settings](https://kubeflow.org/docs/components/hyperparameter-tuning/katib-config/#suggestion-settings).
diff --git a/docs/images/quickstart-trial.png b/docs/images/quickstart-trial.png
deleted file mode 100644
index e763ce030ec..00000000000
Binary files a/docs/images/quickstart-trial.png and /dev/null differ
diff --git a/docs/images/quickstart.png b/docs/images/quickstart.png
deleted file mode 100644
index 5ecae64d65a..00000000000
Binary files a/docs/images/quickstart.png and /dev/null differ
diff --git a/docs/quick-start.md b/docs/quick-start.md
deleted file mode 100644
index d619cbbc442..00000000000
--- a/docs/quick-start.md
+++ /dev/null
@@ -1,176 +0,0 @@
-# Quick Start
-
-Katib is a Kubernetes Native System for [Hyperparameter Tuning][1] and [Neural Architecture Search][2]. This short introduction illustrates how to use Katib to:
-
-- Define a hyperparameter tuning experiment.
-- Evaluate it using the resources in Kubernetes.
-- Get the best hyperparameter combination in all these trials.
-
-## Requirements
-
-Before you run the hyperparameter tuning experiment, you need to have:
-
-- A Kubernetes cluster with [installed TF operator and Katib](https://github.com/kubeflow/katib#installation)
-
-## Katib in Kubeflow
-
-See the following guides in the Kubeflow documentation:
-
-* [Concepts](https://www.kubeflow.org/docs/components/hyperparameter-tuning/overview/)
- in Katib, hyperparameter tuning, and neural architecture search.
-* [Getting started with Katib](https://kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/).
-* Detailed guide to [configuring and running a Katib
- experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/).
-
-## Hyperparameter Tuning on MNIST
-
-Katib supports multiple [Machine Learning Frameworks](https://en.wikipedia.org/wiki/Comparison_of_deep-learning_software) (e.g. TensorFlow, PyTorch, MXNet, and XGBoost).
-
-In this quick start guide, we demonstrate how to use TensorFlow in Katib, which is one of the most popular framework among the world, to run a hyperparameter tuning job on MNIST.
-
-### Package Training Code
-
-The first thing we need to do is to package the training code to a docker image. We use the [example code](https://github.com/kubeflow/tf-operator/blob/master/examples/v1/mnist_with_summaries/mnist_with_summaries.py), which builds a simple neural network, to train on MNIST. The code trains the network and outputs the TFEvents to `/tmp` by default.
-
-You can use our prebuilt image `gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0`. Thus we can skip it.
-
-### Create the Experiment
-
-If you want to use Katib to automatically tune hyperparameters, you need to define the `Experiment`, which is a CRD that represents a single optimization run over a feasible space. Each `Experiment` contains:
-
-1. Configuration about parallelism: The configuration about the parallelism.
-1. Objective: The metric that we want to optimize.
-1. Search space: The name and the distribution (discrete valued or continuous valued) of all the hyperparameters you need to search.
-1. Search algorithm: The algorithm (e.g. Random Search, Grid Search, TPE, Bayesian Optimization) used to find the best hyperparameters.
-1. Trial Template: The template used to define the trial.
-1. Metrics Collection: Definition about how to collect the metrics (e.g. accuracy, loss).
-
-The `Experiment`'s definition is defined here:
-
-
- Click here to get YAML configuration
-
-```yaml
-apiVersion: "kubeflow.org/v1beta1"
-kind: Experiment
-metadata:
- namespace: kubeflow
- name: tfjob-example
-spec:
- parallelTrialCount: 3
- maxTrialCount: 12
- maxFailedTrialCount: 3
- objective:
- type: maximize
- goal: 0.99
- objectiveMetricName: accuracy_1
- algorithm:
- algorithmName: random
- metricsCollectorSpec:
- source:
- fileSystemPath:
- path: /train
- kind: Directory
- collector:
- kind: TensorFlowEvent
- parameters:
- - name: learning_rate
- parameterType: double
- feasibleSpace:
- min: "0.01"
- max: "0.05"
- - name: batch_size
- parameterType: int
- feasibleSpace:
- min: "100"
- max: "200"
- trialTemplate:
- trialParameters:
- - name: learningRate
- description: Learning rate for the training model
- reference: learning_rate
- - name: batchSize
- description: Batch Size
- reference: batch_size
- trialSpec:
- apiVersion: "kubeflow.org/v1"
- kind: TFJob
- spec:
- tfReplicaSpecs:
- Worker:
- replicas: 2
- restartPolicy: OnFailure
- template:
- spec:
- containers:
- - name: tensorflow
- image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
- imagePullPolicy: Always
- command:
- - "python"
- - "/var/tf_mnist/mnist_with_summaries.py"
- - "--log_dir=/train/metrics"
- - "--learning_rate=${trialParameters.learningRate}"
- - "--batch_size=${trialParameters.batchSize}"
-
-```
-
-The experiment has two hyperparameters defined in `parameters`: `learning_rate` and `batch_size`. We decide to use random search algorithm, and collect metrics from the TF Events.
-
-
-
-Or you could just run:
-
-```bash
-kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/tfjob-example.yaml
-```
-
-### Get trial results
-
-You can get the trial results using the command (Need to install [`jq`](https://stedolan.github.io/jq/download/) to parse JSON):
-
-```bash
-kubectl -n kubeflow get trials -o json | jq ".items[] | {assignments: .spec.parameterAssignments, observation: .status.observation}"
-```
-
-You should get the output:
-
-```json
-...
-{
- "assignments": [
- {
- "name": "learning_rate",
- "value": "0.01156268890324629"
- },
- {
- "name": "batch_size",
- "value": "196"
- }
- ],
- "observation": {
- "metrics": [
- {
- "latest": "0.968200027943",
- "max": "1.0",
- "min": "0.0714285746217",
- "name": "accuracy_1"
- }
- ]
- }
-}
-...
-```
-
-Or you could get the result in UI: `/katib/#/katib/hp_monitor/kubeflow/tfjob-example`.
-
-![](./images/quickstart.png)
-
-When you click the trial name, you should get the details about metrics:
-
-![](./images/quickstart-trial.png)
-
-
-
-[1]: https://en.wikipedia.org/wiki/Hyperparameter_optimization
-[2]: https://en.wikipedia.org/wiki/Neural_architecture_search
\ No newline at end of file
diff --git a/docs/user-guide.md b/docs/user-guide.md
deleted file mode 100644
index 4f2ef3e38d1..00000000000
--- a/docs/user-guide.md
+++ /dev/null
@@ -1,3 +0,0 @@
-See the detailed guide to [configuring and running a Katib
-experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/)
-in the Kubeflow docs.
diff --git a/docs/workflow-design.md b/docs/workflow-design.md
index 89075f4be13..d9e6194f2d9 100644
--- a/docs/workflow-design.md
+++ b/docs/workflow-design.md
@@ -1,17 +1,28 @@
# How Katib v1beta1 tunes hyperparameter automatically in a Kubernetes native way
-See the following guides in the Kubeflow documentation:
+Follow the Kubeflow documentation guides:
-* [Concepts](https://www.kubeflow.org/docs/components/hyperparameter-tuning/overview/)
+- [Concepts](https://www.kubeflow.org/docs/components/katib/overview/)
in Katib, hyperparameter tuning, and neural architecture search.
-* [Getting started with Katib](https://kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/).
-* Detailed guide to [configuring and running a Katib
- experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/).
+- [Getting started with Katib](https://kubeflow.org/docs/components/katib/hyperparameter/).
+- Detailed guide to
+ [configuring and running a Katib `Experiment`](https://kubeflow.org/docs/components/katib/experiment/).
## Example and Illustration
-After install Katib v1beta1, you can run `kubectl apply -f katib/examples/v1beta1/random-example.yaml` to try the first example of Katib.
-Then you can get the new `Experiment` as below. Katib concepts will be introduced based on this example.
+After install Katib v1beta1, you can run
+`kubectl apply -f katib/examples/v1beta1/random-example.yaml` to try the first
+example of Katib.
+
+### Experiment
+
+When you want to tune hyperparameters for your machine learning model before
+training it further, you just need to create an `Experiment` CR. To
+learn what fields are included in the `Experiment.spec`, follow
+the detailed guide to
+[configuring and running a Katib `Experiment`](https://kubeflow.org/docs/components/katib/experiment/).
+Then you can get the new `Experiment` as below.
+Katib concepts are introduced based on this example.
```yaml
$ kubectl get experiment random-example -n kubeflow -o yaml
@@ -63,6 +74,9 @@ spec:
parameterType: categorical
resumePolicy: LongRunning
trialTemplate:
+ failureCondition: status.conditions.#(type=="Failed")#|#(status=="True")#
+ primaryContainerName: training-container
+ successCondition: status.conditions.#(type=="Complete")#|#(status=="True")#
trialParameters:
- description: Learning rate for the training model
name: learningRate
@@ -87,48 +101,180 @@ spec:
- --lr=${trialParameters.learningRate}
- --num-layers=${trialParameters.numberLayers}
- --optimizer=${trialParameters.optimizer}
- image: docker.io/kubeflowkatib/mxnet-mnist
+ image: docker.io/kubeflowkatib/mxnet-mnist:v1beta1-e294a90
name: training-container
restartPolicy: Never
status:
- ...
+ completionTime: "2020-11-16T20:13:02Z"
+ conditions:
+ - lastTransitionTime: "2020-11-16T20:00:15Z"
+ lastUpdateTime: "2020-11-16T20:00:15Z"
+ message: Experiment is created
+ reason: ExperimentCreated
+ status: "True"
+ type: Created
+ - lastTransitionTime: "2020-11-16T20:13:02Z"
+ lastUpdateTime: "2020-11-16T20:13:02Z"
+ message: Experiment is running
+ reason: ExperimentRunning
+ status: "False"
+ type: Running
+ - lastTransitionTime: "2020-11-16T20:13:02Z"
+ lastUpdateTime: "2020-11-16T20:13:02Z"
+ message: Experiment has succeeded because max trial count has reached
+ reason: ExperimentMaxTrialsReached
+ status: "True"
+ type: Succeeded
+ currentOptimalTrial:
+ bestTrialName: random-example-gnz5nccf
+ observation:
+ metrics:
+ - latest: "0.979299"
+ max: "0.979299"
+ min: "0.955115"
+ name: Validation-accuracy
+ - latest: "0.993503"
+ max: "0.993503"
+ min: "0.912413"
+ name: Train-accuracy
+ parameterAssignments:
+ - name: lr
+ value: "0.01874909352953323"
+ - name: num-layers
+ value: "5"
+ - name: optimizer
+ value: sgd
+ startTime: "2020-11-16T20:00:15Z"
+ succeededTrialList:
+ - random-example-2fpnqfv8
+ - random-example-2s9vfb9s
+ - random-example-5hxm45x4
+ - random-example-8xmpj4gv
+ - random-example-b6gnl4cs
+ - random-example-ftm2v84q
+ - random-example-gnz5nccf
+ - random-example-p74tn9gk
+ - random-example-q6jrlshx
+ - random-example-tkk46c4x
+ - random-example-w5qgblgk
+ - random-example-xcnrpx4x
+ trials: 12
+ trialsSucceeded: 12
```
-#### Experiment
-When you want to tune hyperparameters for your machine learning model before
-training it further, you just need to create an `Experiment` CR like above. To
-learn what fields are included in the `Experiment.spec`, see
-the detailed guide to [configuring and running a Katib
-experiment](https://kubeflow.org/docs/components/hyperparameter-tuning/experiment/).
+### Suggestion
-#### Trial
+Katib internally creates a `Suggestion` CR for each `Experiment` CR. The
+`Suggestion` CR includes the hyperparameter algorithm name by `algorithmName`
+field and how many sets of hyperparameter Katib asks to be generated by
+`requests` field. The `Suggestion` also traces all already generated sets of
+hyperparameter in `status.suggestions`. The `Suggestion` CR is used for internal
+logic control and end user can even ignore it.
-For each set of hyperparameters, Katib will internally generate a `Trial` CR with the hyperparameters key-value pairs, job manifest string with parameters instantiated and some other fields like below. `Trial` CR is used for internal logic control, and end user can even ignore it.
+```yaml
+$ kubectl get suggestion random-example -n kubeflow -o yaml
+
+apiVersion: kubeflow.org/v1beta1
+kind: Suggestion
+metadata:
+ ...
+ name: random-example
+ namespace: kubeflow
+ ownerReferences:
+ - apiVersion: kubeflow.org/v1beta1
+ blockOwnerDeletion: true
+ controller: true
+ kind: Experiment
+ name: random-example
+ uid: 302e79ae-8659-4679-9e2d-461209619883
+ ...
+spec:
+ algorithm:
+ algorithmName: random
+ requests: 12
+ resumePolicy: LongRunning
+status:
+ conditions:
+ - lastTransitionTime: "2020-11-16T20:00:15Z"
+ lastUpdateTime: "2020-11-16T20:00:15Z"
+ message: Suggestion is created
+ reason: SuggestionCreated
+ status: "True"
+ type: Created
+ - lastTransitionTime: "2020-11-16T20:00:36Z"
+ lastUpdateTime: "2020-11-16T20:00:36Z"
+ message: Deployment is ready
+ reason: DeploymentReady
+ status: "True"
+ type: DeploymentReady
+ - lastTransitionTime: "2020-11-16T20:00:38Z"
+ lastUpdateTime: "2020-11-16T20:00:38Z"
+ message: Suggestion is running
+ reason: SuggestionRunning
+ status: "True"
+ type: Running
+ startTime: "2020-11-16T20:00:15Z"
+ suggestionCount: 12
+ suggestions:
+ ...
+ - name: random-example-2fpnqfv8
+ parameterAssignments:
+ - name: lr
+ value: "0.021135228357807213"
+ - name: num-layers
+ value: "4"
+ - name: optimizer
+ value: sgd
+ - name: random-example-xcnrpx4x
+ parameterAssignments:
+ - name: lr
+ value: "0.02414696373094622"
+ - name: num-layers
+ value: "3"
+ - name: optimizer
+ value: adam
+ - name: random-example-8xmpj4gv
+ parameterAssignments:
+ - name: lr
+ value: "0.02471053882990492"
+ - name: num-layers
+ value: "4"
+ - name: optimizer
+ value: sgd
+ ...
+```
+
+### Trial
+
+For each set of hyperparameters, Katib internally generates a `Trial` CR
+with the hyperparameters key-value pairs, `Worker Job` run specification with
+parameters instantiated and some other fields like below. The `Trial` CR
+is used for internal logic control and end user can even ignore it.
```yaml
$ kubectl get trial -n kubeflow
NAME TYPE STATUS AGE
-random-example-58tbx6xc Succeeded True 14m
-random-example-5nkb2gz2 Succeeded True 21m
-random-example-88bdbkzr Succeeded True 20m
-random-example-9tgjl9nt Succeeded True 17m
-random-example-dqzjb2r9 Succeeded True 19m
-random-example-gjfdgxxn Succeeded True 20m
-random-example-nhrx8tb8 Succeeded True 15m
-random-example-nkv76z8z Succeeded True 18m
-random-example-pcnmzl76 Succeeded True 21m
-random-example-spmk57dw Succeeded True 14m
-random-example-tvxz667x Succeeded True 16m
-random-example-xpw8wnjc Succeeded True 21m
-
-$ kubectl get trial random-example-gjfdgxxn -o yaml -n kubeflow
+random-example-2fpnqfv8 Succeeded True 10m
+random-example-2s9vfb9s Succeeded True 8m15s
+random-example-5hxm45x4 Succeeded True 17m
+random-example-8xmpj4gv Succeeded True 8m44s
+random-example-b6gnl4cs Succeeded True 12m
+random-example-ftm2v84q Succeeded True 17m
+random-example-gnz5nccf Succeeded True 14m
+random-example-p74tn9gk Succeeded True 11m
+random-example-q6jrlshx Succeeded True 17m
+random-example-tkk46c4x Succeeded True 12m
+random-example-w5qgblgk Succeeded True 12m
+random-example-xcnrpx4x Succeeded True 10m
+
+$ kubectl get trial random-example-2fpnqfv8 -o yaml -n kubeflow
apiVersion: kubeflow.org/v1beta1
kind: Trial
metadata:
...
- name: random-example-gjfdgxxn
+ name: random-example-2fpnqfv8
namespace: kubeflow
ownerReferences:
- apiVersion: kubeflow.org/v1beta1
@@ -136,9 +282,10 @@ metadata:
controller: true
kind: Experiment
name: random-example
- uid: 34349cb7-c6af-11ea-90dd-42010a9a0020
+ uid: 302e79ae-8659-4679-9e2d-461209619883
...
spec:
+ failureCondition: status.conditions.#(type=="Failed")#|#(status=="True")#
metricsCollector:
collector:
kind: StdOut
@@ -155,16 +302,17 @@ spec:
type: maximize
parameterAssignments:
- name: lr
- value: "0.012171302435678337"
+ value: "0.021135228357807213"
- name: num-layers
- value: "3"
+ value: "4"
- name: optimizer
- value: adam
+ value: sgd
+ primaryContainerName: training-container
runSpec:
apiVersion: batch/v1
kind: Job
metadata:
- name: random-example-gjfdgxxn
+ name: random-example-2fpnqfv8
namespace: kubeflow
spec:
template:
@@ -174,117 +322,95 @@ spec:
- python3
- /opt/mxnet-mnist/mnist.py
- --batch-size=64
- - --lr=0.012171302435678337
- - --num-layers=3
- - --optimizer=adam
- image: docker.io/kubeflowkatib/mxnet-mnist
+ - --lr=0.021135228357807213
+ - --num-layers=4
+ - --optimizer=sgd
+ image: docker.io/kubeflowkatib/mxnet-mnist:v1beta1-e294a90
name: training-container
restartPolicy: Never
+ successCondition: status.conditions.#(type=="Complete")#|#(status=="True")#
status:
- completionTime: "2020-07-15T15:29:00Z"
+ completionTime: "2020-11-16T20:09:33Z"
conditions:
- - lastTransitionTime: "2020-07-15T15:25:16Z"
- lastUpdateTime: "2020-07-15T15:25:16Z"
+ - lastTransitionTime: "2020-11-16T20:07:48Z"
+ lastUpdateTime: "2020-11-16T20:07:48Z"
message: Trial is created
reason: TrialCreated
status: "True"
type: Created
- - lastTransitionTime: "2020-07-15T15:29:00Z"
- lastUpdateTime: "2020-07-15T15:29:00Z"
+ - lastTransitionTime: "2020-11-16T20:09:33Z"
+ lastUpdateTime: "2020-11-16T20:09:33Z"
message: Trial is running
reason: TrialRunning
status: "False"
type: Running
- - lastTransitionTime: "2020-07-15T15:29:00Z"
- lastUpdateTime: "2020-07-15T15:29:00Z"
+ - lastTransitionTime: "2020-11-16T20:09:33Z"
+ lastUpdateTime: "2020-11-16T20:09:33Z"
message: Trial has succeeded
reason: TrialSucceeded
status: "True"
type: Succeeded
observation:
metrics:
- - latest: "0.959594"
- max: "0.960490"
- min: "0.940585"
+ - latest: "0.977309"
+ max: "0.978105"
+ min: "0.958002"
name: Validation-accuracy
- - latest: "0.959022"
- max: "0.959188"
- min: "0.921658"
+ - latest: "0.993820"
+ max: "0.993820"
+ min: "0.916611"
name: Train-accuracy
- startTime: "2020-07-15T15:25:16Z"
+ startTime: "2020-11-16T20:07:48Z"
```
-#### Suggestion
+## What happens after an `Experiment` CR is created
-Katib will internally create a `Suggestion` CR for each `Experiment` CR. `Suggestion` CR includes the hyperparameter algorithm name by `algorithmName` field and how many sets of hyperparameter Katib asks to be generated by `requests` field. The CR also traces all already generated sets of hyperparameter in `status.suggestions`. Same as `Trial`, `Suggestion` CR is used for internal logic control and end user can even ignore it.
+When user creates an `Experiment` CR, Katib `Experiment` controller,
+`Suggestion` controller and `Trial` controller is working together to achieve
+hyperparameters tuning for user's Machine learning model. The Experiment
+workflow looks as follows:
-```yaml
-$ kubectl get suggestion random-example -n kubeflow -o yaml
-
-apiVersion: kubeflow.org/v1beta1
-kind: Suggestion
-metadata:
- ...
- name: random-example
- namespace: kubeflow
- ownerReferences:
- - apiVersion: kubeflow.org/v1beta1
- blockOwnerDeletion: true
- controller: true
- kind: Experiment
- name: random-example
- uid: 34349cb7-c6af-11ea-90dd-42010a9a0020
- ...
-spec:
- algorithmName: random
- requests: 12
-status:
- suggestionCount: 12
- suggestions:
- ...
- - name: random-example-gjfdgxxn
- parameterAssignments:
- - name: lr
- value: "0.012171302435678337"
- - name: num-layers
- value: "3"
- - name: optimizer
- value: adam
- - name: random-example-88bdbkzr
- parameterAssignments:
- - name: lr
- value: "0.013408352284328112"
- - name: num-layers
- value: "4"
- - name: optimizer
- value: ftrl
- - name: random-example-dqzjb2r9
- parameterAssignments:
- - name: lr
- value: "0.028873905258692753"
- - name: num-layers
- value: "3"
- - name: optimizer
- value: adam
- ...
-```
-
-## What happens after an `Experiment` CR created
-
-When a user created an `Experiment` CR, Katib controllers including experiment controller, trial controller and suggestion controller will work together to achieve hyperparameters tuning for user Machine learning model.
-1. A `Experiment` CR is submitted to Kubernetes API server, Katib experiment mutating and validating webhook will be called to set default value for the `Experiment` CR and validate the CR separately.
-2. Experiment controller creates a `Suggestion` CR.
-3. Suggestion controller creates the algorithm deployment and service based on the new `Suggestion` CR.
-4. When Suggestion controller verifies that the algorithm service is ready, it calls the service to generate `spec.request - len(status.suggestions)` sets of hyperparamters and append them into `status.suggestions`
-5. Experiment controller finds that `Suggestion` CR had been updated, then generate each `Trial` for each new hyperparamters set.
-6. Trial controller generates job based on `trialSpec` manifest with the new hyperparamters set.
-7. Related job controller (Kubernetes batch Job, Kubeflow PyTorchJob or Kubeflow TFJob) generates Pods.
-8. Katib Pod mutating webhook is called to inject metrics collector sidecar container to the candidate Pod.
-9. During the ML model container runs, metrics collector container in the same Pod tries to collect metrics from it and persists them into Katib DB backend.
-10. When the ML model Job ends, Trial controller will update status of the corresponding `Trial` CR.
-11. When a `Trial` CR goes to end, Experiment controller will increase `request` field of corresponding
-`Suggestion` CR if it is needed, then everything goes to `step 4` again. Of course, if `Trial` CRs meet one of `end` condition (exceeds `maxTrialCount`, `maxFailedTrialCount` or `goal`), Experiment controller will take everything done.
+1. The `Experiment` CR is submitted to the Kubernetes API server. Katib
+ `Experiment` mutating and validating webhook is called to set the default
+ values for the `Experiment` CR and validate the CR separately.
+
+1. The `Experiment` controller creates the `Suggestion` CR.
+
+1. The `Suggestion` controller creates the algorithm deployment and service
+ based on the new `Suggestion` CR.
+
+1. When the `Suggestion` controller verifies that the algorithm service is
+ ready, it calls the service to generate
+ `spec.request - len(status.suggestions)` sets of hyperparamters and append
+ them into `status.suggestions`.
+
+1. The `Experiment` controller finds that `Suggestion` CR had been updated and
+ generates each `Trial` for the each new hyperparamters set.
+
+1. The `Trial` controller generates `Worker Job` based on the `runSpec`
+ from the `Trial` CR with the new hyperparamters set.
+
+1. The related job controller
+ (Kubernetes batch Job, Kubeflow TFJob, Tekton Pipeline, etc.) generates
+ Kubernetes Pods.
+
+1. Katib Pod mutating webhook is called to inject the metrics collector sidecar
+ container to the candidate Pods.
+
+1. During the ML model container runs, the metrics collector container
+ collects metrics from the injected pod and persists metrics to the Katib
+ DB backend.
+
+1. When the ML model training ends, the `Trial` controller updates status
+ of the corresponding `Trial` CR.
+
+1. When the `Trial` CR goes to end, the `Experiment` controller increases
+ `request` field of the corresponding `Suggestion` CR if it is needed,
+ then everything goes to `step 4` again.
+ Of course, if the `Trial` CRs meet one of `end` condition
+ (exceeds `maxTrialCount`, `maxFailedTrialCount` or `goal`),
+ the `Experiment` controller takes everything done.