Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Dev hyperband (#405)
Browse files Browse the repository at this point in the history
* support hyperband

* add example for hyperband

* register Hyperband in tuner

* after debug

* update doc

* trivial change

* update spec validation of yaml config

* modify nnictl launcher

* modify nnimanager and util to support advisor

* Quick fix nnictl config logic (#289)

* fix nnictl bug

* fix install.sh

* add desc for Dockerfile.build.base

* update document for Dockerfile

* update

* refactor port detect

* update

* refactor NNICTLDOC.md

* add document for pai and nnictl

* add default value for port

* add exception handling in trial_keeper.py

* fix port bug

* fix resume

* fix nnictl resume and fix nnictl stop

* fix document

* update

* refactor nnictl

* update

* update doc

* update

* update nnictl

* fix comment

* revert dockerfile

* update

* update

* update

* fix nnictl error hit

* fix comments

* fix bash-completion

* fix paramiko install

* quick fix resume logic

* update

* quick fix nnictl

* refactor sdk main

* update unit test accordingly

* update example's config file

* update restserver validation

* PR merge to 0.3 (#297)

* refactor doc

* update with Mao's suggestions

* Set theme jekyll-theme-dinky

* update doc

* fix links

* fix links

* fix links

* merge

* fix links and doc errors

* merge

* merge

* merge

* merge

* Update README.md (#288)

added License badge

* merge

* updated the "Contribute" part (merged Gems' wiki in, updated ReadMe)

* fix link

* fix doc mistakes and broken links. (#271)

* refactor doc

* update with Mao's suggestions

* Set theme jekyll-theme-dinky

* updated the "Contribute" part (merged Gems' wiki in, updated ReadMe)

* fix link

* Update README.md

* Fix misspelling in examples/trials/ga_squad/README.md

* revise the installation cmd to v0.2

* revise to install v0.2

* remove files

* update

* remove enas readme (#292)

* support checkpoint directory

* Fix datastore performance issue (#301)

* fix pylint

* Fix nnictl in v0.3 (#299)

Fix old version of config file
fix sklearn requirements
Fix resume log logic

* modify log

* trivial changes

* update example

* update makefile

* update launcher.py to fix the problem of finding main.js

* debug

* add hyperparameter info into trial_end api

* fix bug and update example

* fix error induced by merge

* support initialize

* add doc for hyperband

* fix bugs and add config_pai

* fix bugs and add config_pai

* fix bugs and add config_pai

* fix bugs and add config_pai

* update doc

* add doc for advisor

* fit

* modification based on hui's comments

* update doc
  • Loading branch information
QuanluZhang authored Nov 30, 2018
1 parent d2f0638 commit a387250
Show file tree
Hide file tree
Showing 23 changed files with 1,067 additions and 132 deletions.
5 changes: 4 additions & 1 deletion docs/howto_2_CustomizedTuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ So, if user want to implement a customized Tuner, she/he only need to:

1) Inherit a tuner of a base Tuner class
2) Implement receive_trial_result and generate_parameter function
3) Write a script to run Tuner
3) Configure your customized tuner in experiment yaml config file

Here ia an example:

Expand Down Expand Up @@ -93,3 +93,6 @@ More detail example you could see:
> * [evolution-tuner](../src/sdk/pynni/nni/evolution_tuner)
> * [hyperopt-tuner](../src/sdk/pynni/nni/hyperopt_tuner)
> * [evolution-based-customized-tuner](../examples/tuners/ga_customer_tuner)
## Write a more advanced automl algorithm
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](../src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](howto_3_CustomizedAdvisor) for how to write a customized advisor.
39 changes: 39 additions & 0 deletions docs/howto_3_CustomizedAdvisor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# **How To** - Customize Your Own Advisor

*Advisor targets the scenario that the automl algorithm wants the methods of both tuner and assessor. Advisor is similar to tuner on that it receives trial configuration request, final results, and generate trial configurations. Also, it is similar to assessor on that it receives intermediate results, trial's end state, and could send trial kill command. Note that, if you use Advisor, tuner and assessor are not allowed to be used at the same time.*

So, if user want to implement a customized Advisor, she/he only need to:

1) Define an Advisor inheriting from the MsgDispatcherBase class
2) Implement the methods with prefix `handle_` except `handle_request`
3) Configure your customized Advisor in experiment yaml config file

Here ia an example:

**1) Define an Advisor inheriting from the MsgDispatcherBase class**
```python
from nni.msg_dispatcher_base import MsgDispatcherBase

class CustomizedAdvisor(MsgDispatcherBase):
def __init__(self, ...):
...
```

**2) Implement the methods with prefix `handle_` except `handle_request`**

Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](../src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods.

**3) Configure your customized Advisor in experiment yaml config file**

Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor.

```yaml
advisor:
codeDir: /home/abc/myadvisor
classFileName: my_customized_advisor.py
className: CustomizedAdvisor
# Any parameter need to pass to your advisor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
```
25 changes: 25 additions & 0 deletions examples/trials/mnist-hyperband/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 2
maxExecDuration: 100h
maxTrialNum: 10000
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS (could be the number of mini-batches or epochs) can be
# allocated to a trial. Each trial should use STEPS to control how long it runs.
R: 100
#eta: proportion of discarded trials
eta: 3
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
39 changes: 39 additions & 0 deletions examples/trials/mnist-hyperband/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
authorName: default
experimentName: example_mnist_hyperband
maxExecDuration: 1h
maxTrialNum: 10000
trialConcurrency: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS
R: 100
#eta: proportion of discarded trials
eta: 3
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
236 changes: 236 additions & 0 deletions examples/trials/mnist-hyperband/mnist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
"""A deep MNIST classifier using convolutional layers."""

import logging
import math
import tempfile
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

import nni

FLAGS = None

logger = logging.getLogger('mnist_AutoML')


class MnistNetwork(object):
'''
MnistNetwork is for initlizing and building basic network for mnist.
'''
def __init__(self,
channel_1_num,
channel_2_num,
conv_size,
hidden_size,
pool_size,
learning_rate,
x_dim=784,
y_dim=10):
self.channel_1_num = channel_1_num
self.channel_2_num = channel_2_num
self.conv_size = conv_size
self.hidden_size = hidden_size
self.pool_size = pool_size
self.learning_rate = learning_rate
self.x_dim = x_dim
self.y_dim = y_dim

self.images = tf.placeholder(tf.float32, [None, self.x_dim], name='input_x')
self.labels = tf.placeholder(tf.float32, [None, self.y_dim], name='input_y')
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')

self.train_step = None
self.accuracy = None

def build_network(self):
'''
Building network for mnist
'''

# Reshape to use within a convolutional neural net.
# Last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
with tf.name_scope('reshape'):
try:
input_dim = int(math.sqrt(self.x_dim))
except:
print(
'input dim cannot be sqrt and reshape. input dim: ' + str(self.x_dim))
logger.debug(
'input dim cannot be sqrt and reshape. input dim: %s', str(self.x_dim))
raise
x_image = tf.reshape(self.images, [-1, input_dim, input_dim, 1])

# First convolutional layer - maps one grayscale image to 32 feature maps.
with tf.name_scope('conv1'):
w_conv1 = weight_variable(
[self.conv_size, self.conv_size, 1, self.channel_1_num])
b_conv1 = bias_variable([self.channel_1_num])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)

# Pooling layer - downsamples by 2X.
with tf.name_scope('pool1'):
h_pool1 = max_pool(h_conv1, self.pool_size)

# Second convolutional layer -- maps 32 feature maps to 64.
with tf.name_scope('conv2'):
w_conv2 = weight_variable([self.conv_size, self.conv_size,
self.channel_1_num, self.channel_2_num])
b_conv2 = bias_variable([self.channel_2_num])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)

# Second pooling layer.
with tf.name_scope('pool2'):
h_pool2 = max_pool(h_conv2, self.pool_size)

# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
# is down to 7x7x64 feature maps -- maps this to 1024 features.
last_dim = int(input_dim / (self.pool_size * self.pool_size))
with tf.name_scope('fc1'):
w_fc1 = weight_variable(
[last_dim * last_dim * self.channel_2_num, self.hidden_size])
b_fc1 = bias_variable([self.hidden_size])

h_pool2_flat = tf.reshape(
h_pool2, [-1, last_dim * last_dim * self.channel_2_num])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

# Dropout - controls the complexity of the model, prevents co-adaptation of features.
with tf.name_scope('dropout'):
h_fc1_drop = tf.nn.dropout(h_fc1, self.keep_prob)

# Map the 1024 features to 10 classes, one for each digit
with tf.name_scope('fc2'):
w_fc2 = weight_variable([self.hidden_size, self.y_dim])
b_fc2 = bias_variable([self.y_dim])
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2

with tf.name_scope('loss'):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=self.labels, logits=y_conv))
with tf.name_scope('adam_optimizer'):
self.train_step = tf.train.AdamOptimizer(
self.learning_rate).minimize(cross_entropy)

with tf.name_scope('accuracy'):
correct_prediction = tf.equal(
tf.argmax(y_conv, 1), tf.argmax(self.labels, 1))
self.accuracy = tf.reduce_mean(
tf.cast(correct_prediction, tf.float32))


def conv2d(x_input, w_matrix):
"""conv2d returns a 2d convolution layer with full stride."""
return tf.nn.conv2d(x_input, w_matrix, strides=[1, 1, 1, 1], padding='SAME')


def max_pool(x_input, pool_size):
"""max_pool downsamples a feature map by 2X."""
return tf.nn.max_pool(x_input, ksize=[1, pool_size, pool_size, 1],
strides=[1, pool_size, pool_size, 1], padding='SAME')


def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)


def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)


def main(params):
'''
Main function, build mnist network, run and send result to NNI.
'''
# Import data
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
print('Mnist download data down.')
logger.debug('Mnist download data down.')

# Create the model
# Build the graph for the deep net
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'],
channel_2_num=params['channel_2_num'],
conv_size=params['conv_size'],
hidden_size=params['hidden_size'],
pool_size=params['pool_size'],
learning_rate=params['learning_rate'])
mnist_network.build_network()
logger.debug('Mnist build network done.')

# Write log
graph_location = tempfile.mkdtemp()
logger.debug('Saving graph to: %s', graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())

test_acc = 0.0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(params['batch_num']):
batch = mnist.train.next_batch(params['batch_size'])
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: 1 - params['dropout_rate']}
)

if i % 10 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})

nni.report_intermediate_result(test_acc)
logger.debug('test accuracy %g', test_acc)
logger.debug('Pipe send intermediate result done.')

test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})

nni.report_final_result(test_acc)
logger.debug('Final result is %g', test_acc)
logger.debug('Send final result done.')


def generate_default_params():
'''
Generate default parameters for mnist network.
'''
params = {
'data_dir': '/tmp/tensorflow/mnist/input_data',
'dropout_rate': 0.5,
'channel_1_num': 32,
'channel_2_num': 64,
'conv_size': 5,
'pool_size': 2,
'hidden_size': 1024,
'learning_rate': 1e-4,
'batch_size': 32}
return params


if __name__ == '__main__':
try:
# get parameters form tuner
RCV_PARAMS = nni.get_next_parameter()
logger.debug(RCV_PARAMS)
# run
params = generate_default_params()
params.update(RCV_PARAMS)
'''
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial,
there is one more key called `STEPS` besides the hyperparameters defined by user.
By using this `STEPS`, the trial can control how long it runs.
'''
params['batch_num'] = RCV_PARAMS['STEPS'] * 10
main(params)
except Exception as exception:
logger.exception(exception)
raise
7 changes: 7 additions & 0 deletions examples/trials/mnist-hyperband/search_space.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"dropout_rate":{"_type":"uniform","_value":[0.5,0.9]},
"conv_size":{"_type":"choice","_value":[2,3,5,7]},
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
"batch_size": {"_type":"choice","_value":[8, 16, 32, 64]},
"learning_rate":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]}
}
2 changes: 1 addition & 1 deletion pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ max-attributes=15
const-naming-style=any

disable=duplicate-code,
super-init-not-called
super-init-not-called
11 changes: 10 additions & 1 deletion src/nni_manager/common/manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ interface ExperimentParams {
trainingServicePlatform: string;
multiPhase?: boolean;
multiThread?: boolean;
tuner: {
tuner?: {
className: string;
builtinTunerName?: string;
codeDir?: string;
Expand All @@ -53,6 +53,15 @@ interface ExperimentParams {
checkpointDir: string;
gpuNum?: number;
};
advisor?: {
className: string;
builtinAdvisorName?: string;
codeDir?: string;
classArgs?: any;
classFileName?: string;
checkpointDir: string;
gpuNum?: number;
};
clusterMetaData?: {
key: string;
value: string;
Expand Down
Loading

0 comments on commit a387250

Please sign in to comment.