Skip to content

Commit

Permalink
Merge pull request #241 from microsoft/master
Browse files Browse the repository at this point in the history
merge master
  • Loading branch information
SparkSnail authored Apr 16, 2020
2 parents b4773e1 + f8d42a3 commit 6728799
Show file tree
Hide file tree
Showing 74 changed files with 1,988 additions and 4,230 deletions.
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/studentProgram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Question for NNI Student Program China / NNI 学生项目问题表单
about: NNI Student Program China issue template on Github

---
<!-- Here is an issue template for NNI student program China. You are encouraged to raise concerns about any issue and share your ideas of NNI or our student program. Both Chinese and English are acceptable.
If it is a general question / idea of NNI, you could just make a short summary.
If it is a operational issue, please fill operational issue template and provide as many details as possible. Not doing so may result in your bug not being addressed in a timely manner. Thanks!-->

## 问题概述

**请简要概述您的问题 / 观点** :


## 其他建议

**是否需要更新文档(是 / 否)** :
**其他分享内容** :

## General Question

**Short summary about the question / idea** :


## Other Advice
**Need to update document ( yes / no )** :
**Anything else we need to know** :
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The tool manages automated machine learning (AutoML) experiments, **dispatches a
* Researchers and data scientists who want to easily **implement and experiment new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
* ML Platform owners who want to **support AutoML in their platform**.

### **NNI v1.4 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
### **NNI v1.5 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**

## **NNI capabilities in a glance**

Expand Down Expand Up @@ -108,6 +108,7 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Evolution">Naïve Evolution</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Anneal">Anneal</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Hyperband">Hyperband</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#PBTTuner">PBT</a></li>
</ul>
<b>Bayesian optimization</b>
<ul>
Expand All @@ -131,7 +132,8 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/NAS/CDARTS.md">CDARTS</a></li>
<li><a href="docs/en_US/NAS/SPOS.md">SPOS</a></li>
<li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a> </li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li>
<li><a href="docs/en_US/NAS/TextNAS.md">TextNAS</a></li>
</ul>
</ul>
<a href="docs/en_US/Compressor/Overview.md">Model Compression</a>
Expand Down Expand Up @@ -236,7 +238,7 @@ The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is
* Download the examples via clone the source code.

```bash
git clone -b v1.4 https://github.com/Microsoft/nni.git
git clone -b v1.5 https://github.com/Microsoft/nni.git
```

* Run the MNIST example.
Expand Down
2 changes: 1 addition & 1 deletion deployment/deployment-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ jobs:
dependsOn: version_number_validation
condition: succeeded()
pool:
vmImage: 'macOS 10.13'
vmImage: 'macOS-10.15'
strategy:
matrix:
Python36:
Expand Down
3 changes: 2 additions & 1 deletion deployment/pypi/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ build:
cp -r $(CWD)../../src/nni_manager/dist $(CWD)nni
cp -r $(CWD)../../src/nni_manager/config $(CWD)nni
cp -r $(CWD)../../src/webui/build $(CWD)nni/static
cp -r $(CWD)../../src/nasui/build $(CWD)nni/nasui
mkdir -p $(CWD)nni/nasui/build
cp -r $(CWD)../../src/nasui/build/. $(CWD)nni/nasui/build
cp $(CWD)../../src/nasui/server.js $(CWD)nni/nasui
cp $(CWD)../../src/nni_manager/package.json $(CWD)nni
sed -ie 's/$(NNI_VERSION_TEMPLATE)/$(NNI_VERSION_VALUE)/' $(CWD)nni/package.json
Expand Down
4 changes: 3 additions & 1 deletion docs/en_US/NAS/NasGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,9 @@ trainer.export(file="model_dir/final_architecture.json") # export the final arc

Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.

Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).
Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).

Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)

### Distributed NAS

Expand Down
10 changes: 8 additions & 2 deletions docs/en_US/NAS/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,19 @@ NNI currently supports the NAS algorithms listed below and is adding more. Users
| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
| [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332).|
| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms. |
| [TextNAS](TextNAS.md) | [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). It is a neural architecture search algorithm tailored for text representation. |

One-shot algorithms run **standalone without nnictl**. Only the PyTorch version has been implemented. Tensorflow 2.x will be supported in a future release.

Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.

* NNI 1.2+
* tensorboard
* PyTorch 1.2+
* git

One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).

## Supported Distributed NAS Algorithms

|Name|Brief Introduction of Algorithm|
Expand All @@ -49,6 +51,10 @@ The programming interface of designing and searching a model is often demanded i

[Here](./NasGuide.md) is the user guide to get started with using NAS on NNI.

## NAS Visualization

To help users track the process and status of how the model is searched under specified search space, we developed a visualization tool. It visualizes search space as a super-net and shows importance of subnets and layers/operations, as well as how the importance changes along with the search process. Please refer to [the document of NAS visualization](./Visualization.md) for how to use it.

## Reference and Feedback

[1]: https://arxiv.org/abs/1802.03268
Expand Down
69 changes: 69 additions & 0 deletions docs/en_US/NAS/Visualization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# NAS Visualization (Experimental)

## Built-in Trainers Support

Currently, only ENAS and DARTS support visualization. Examples of [ENAS](./ENAS.md) and [DARTS](./DARTS.md) has demonstrated how to enable visualization in your code, namely, adding this before `trainer.train()`:

```python
trainer.enable_visualization()
```

This will create a directory `logs/<current_time_stamp>` in your working folder, in which you will find two files `graph.json` and `log`.

You don't have to wait until your program finishes to launch NAS UI, but it's important that these two files have been already created. Launch NAS UI with

```bash
nnictl webui nas --logdir logs/<current_time_stamp> --port <port>
```

## Visualize a Customized Trainer

If you are interested in how to customize a trainer, please read this [doc](./Advanced.md#extend-the-ability-of-one-shot-trainers).

You should do two modifications to an existing trainer to enable visualization:

1. Export your graph before training, with

```python
vis_graph = self.mutator.graph(inputs)
# `inputs` is a dummy input to your model. For example, torch.randn((1, 3, 32, 32)).cuda()
# If your model has multiple inputs, it should be a tuple.
with open("/path/to/your/logdir/graph.json", "w") as f:
json.dump(vis_graph, f)
```

2. Logging the choices you've made. You can do it once per epoch, once per mini-batch or whatever frequency you'd like.

```python
def __init__(self):
# ...
self.status_writer = open("/path/to/your/logdir/log", "w") # create a writer

def train(self):
# ...
print(json.dumps(self.mutator.status()), file=self.status_writer, flush=True) # dump a record of status
```

If you are implementing your customized trainer inheriting `Trainer`. We have provided `enable_visualization()` and `_write_graph_status()` for easy-to-use purposes. All you need to do is calling `trainer.enable_visualization()` before start, and `trainer._write_graph_status()` each time you want to do the logging. But remember both of these APIs are experimental and subject to change in future.

Last but not least, invode NAS UI with

```bash
nnictl webui nas --logdir /path/to/your/logdir
```

## NAS UI Preview

![](../../img/nasui-1.png)

![](../../img/nasui-2.png)

## Limitations

* NAS visualization only works with PyTorch >=1.4. We've tested it on PyTorch 1.3.1 and it doesn't work.
* We rely on PyTorch support for tensorboard for graph export, which relies on `torch.jit`. It will not work if your model doesn't support `jit`.
* There are known performance issues when loading a moderate-size graph with many op choices (like DARTS search space).

## Feedback

NAS UI is currently experimental. We welcome your feedback. [Here](https://github.com/microsoft/nni/pull/2085) we have listed all the to-do items of NAS UI in the future. Feel free to comment (or [submit a new issue](https://github.com/microsoft/nni/issues/new?template=enhancement.md)) if you have other suggestions.
42 changes: 42 additions & 0 deletions docs/en_US/Release.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,47 @@
# ChangeLog

## Release 1.5 - 4/13/2020

### New Features and Documentation

#### Hyper-Parameter Optimizing

* New tuner: [Population Based Training (PBT)](https://github.com/microsoft/nni/blob/master/docs/en_US/Tuner/PBTTuner.md)
* Trials can now report infinity and NaN as result

#### Neural Architecture Search

* New NAS algorithm: [TextNAS](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/TextNAS.md)
* ENAS and DARTS now support [visualization](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/Visualization.md) through web UI.

#### Model Compression

* New Pruner: [GradientRankFilterPruner](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/Pruner.md#gradientrankfilterpruner)
* Compressors will validate configuration by default
* Refactor: Adding optimizer as an input argument of pruner, for easy support of DataParallel and more efficient iterative pruning. This is a broken change for the usage of iterative pruning algorithms.
* Model compression examples are refactored and improved
* Added documentation for [implementing compressing algorithm](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/Framework.md)

#### Training Service

* Kubeflow now supports pytorchjob crd v1 (thanks external contributor @jiapinai)
* Experimental [DLTS](https://github.com/microsoft/nni/blob/master/docs/en_US/TrainingService/DLTSMode.md) support

#### Overall Documentation Improvement

* Documentation is significantly improved on grammar, spelling, and wording (thanks external contributor @AHartNtkn)

### Fixed Bugs

* ENAS cannot have more than one LSTM layers (thanks external contributor @marsggbo)
* NNI manager's timers will never unsubscribe (thanks external contributor @guilhermehn)
* NNI manager may exhaust head memory (thanks external contributor @Sundrops)
* Batch tuner does not support customized trials (#2075)
* Experiment cannot be killed if it failed on start (#2080)
* Non-number type metrics break web UI (#2278)
* A bug in lottery ticket pruner
* Other minor glitches

## Release 1.4 - 2/19/2020

### Major Features
Expand Down
8 changes: 6 additions & 2 deletions docs/en_US/Tuner/BuiltinTuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -463,13 +463,13 @@ tuner:

**Suggested scenario**

Population Based Training (PBT) which bridges and extends parallel search methods and sequential optimization methods. It has a wallclock run time that is no greater than that of a single optimization process, does not require sequential runs, and is also able to use fewer computational resources than naive search methods. Therefore, it's effective when you want to save computational resources and time. Besides, PBT returns hyperparameter scheduler instead of configuration. If you don't need to get a specific configuration, but just expect good results, you can choose this tuner. It should be noted that, in our implementation, the operation of checkpoint storage location is involved. A trial is considered as several traning epochs of training, so the loading and saving of checkpoint must be specified in the trial code, which is different with other tuners. Otherwise, if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials. You could try it on very simple task, such as the [mnist-pbt-tuner-pytorch](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-pbt-tuner-pytorch) example. [See details](./PBTTuner.md)
Population Based Training (PBT) bridges and extends parallel search methods and sequential optimization methods. It requires relatively small computation resource, by inheriting weights from currently good-performing ones to explore better ones periodically. With PBTTuner, users finally get a trained model, rather than a configuration that could reproduce the trained model by training the model from scratch. This is because model weights are inherited periodically through the whole search process. PBT can also be seen as a training approach. If you don't need to get a specific configuration, but just expect a good model, PBTTuner is a good choice. [See details](./PBTTuner.md)

**classArgs requirements:**

* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **all_checkpoint_dir** (*str, optional, default = None*) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/<exp-id>". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
* **population_size** (*int, optional, default = 10*) - Number of trials for each step. In our implementation, one step is running each trial by specific training epochs set by users.
* **population_size** (*int, optional, default = 10*) - Number of trials in a population. Each step has this number of trials. In our implementation, one step is running each trial by specific training epochs set by users.
* **factors** (*tuple, optional, default = (1.2, 0.8)*) - Factors for perturbation of hyperparameters.
* **fraction** (*float, optional, default = 0.2*) - Fraction for selecting bottom and top trials.

Expand All @@ -482,6 +482,10 @@ tuner:
classArgs:
optimize_mode: maximize
```

Note that, to use this tuner, your trial code should be modified accordingly, please refer to [the document of PBTTuner](./PBTTuner.md) for details.


## **Reference and Feedback**
* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
Expand Down
Loading

0 comments on commit 6728799

Please sign in to comment.