microsoft · leckie-chn · Jan 8, 2019 · Jan 7, 2019 · Jan 7, 2019 · Jan 8, 2019
diff --git a/docs/AdvancedNAS.md b/docs/AdvancedNAS.md
@@ -0,0 +1,73 @@
+# Tutorial for Advanced Neural Architecture Search
+Currently many of the NAS algorithms leverage the technique of **weight sharing** among trials to accelerate its training process. For example, [ENAS][1] delivers 1000x effiency with '_parameter sharing between child models_', compared with the previous [NASNet][2] algorithm. Other NAS algorithms such as [DARTS][3], [Network Morphism][4], and [Evolution][5] is also leveraging, or has the potential to leverage weight sharing.
+
+This is a tutorial on how to enable weight sharing in NNI.
+
+## Weight Sharing among trials
+Currently we recommend sharing weights through NFS (Network File System), which supports sharing files across machines, and is light-weighted, (relatively) efficient. We also welcome contributions from the community on more efficient techniques.
+
+### NFS Setup
+In NFS, files are physically stored on a server machine, and trials on the client machine can read/write those files in the same way that they access local files.
+
+#### Install NFS on server machine
+First, install NFS server:
+```bash
+sudo apt-get install nfs-kernel-server
+```
+Suppose `/tmp/nni/shared` is used as the physical storage, then run:
+```bash
+sudo mkdir -p /tmp/nni/shared
+sudo echo "/tmp/nni/shared *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
+sudo service nfs-kernel-server restart
+```
+You can check if the above directory is successfully exported by NFS using `sudo showmount -e localhost`
+
+#### Install NFS on client machine
+First, install NFS client:
+```bash
+sudo apt-get install nfs-common
+```
+Then create & mount the mounted directory of shared files:
+```bash
+sudo mkdir -p /mnt/nfs/nni/
+sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni
+```
+where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice.
+
+### Weight Sharing through NFS file
+With the NFS setup, trial code can share model weight through loading & saving files. For example, in tensorflow:
+```python
+# save models
+saver = tf.train.Saver()
+saver.save(sess, os.path.join(params['save_path'], 'model.ckpt'))
+# load models
+tf.init_from_checkpoint(params['restore_path'])
+```
+where `'save_path'` and `'restore_path'` in hyper-parameter can be managed by the tuner.
+
+## Asynchornous Dispatcher Mode for trial dependency control
+The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example:
+```python
+    def generate_parameters(self, parameter_id):
+        self.thread_lock.acquire()
+        indiv = # configuration for a new trial
+        self.events[parameter_id] = threading.Event()
+        self.thread_lock.release()
+        if indiv.parent_id is not None:
+            self.events[indiv.parent_id].wait()
+
+    def receive_trial_result(self, parameter_id, parameters, reward):
+        self.thread_lock.acquire()
+        # code for processing trial results
+        self.thread_lock.release()
+        self.events[parameter_id].set()
+```
+
+## Examples
+For details, please refer to this [simple weight sharing example](../test/async_sharing_test). We also provided a [practice example](../examples/trials/weight_sharing/ga_squad) for reading comprehension, based on previous [ga_squad](../examples/trials/ga_squad) example.
+
+[1]: https://arxiv.org/abs/1802.03268
+[2]: https://arxiv.org/abs/1707.07012
+[3]: https://arxiv.org/abs/1806.09055
+[4]: https://arxiv.org/abs/1806.10282
+[5]: https://arxiv.org/abs/1703.01041 
diff --git a/examples/trials/ga_squad/trial.py b/examples/trials/ga_squad/trial.py
@@ -338,7 +338,7 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                 answers = generate_predict_json(
                     position1, position2, ids, contexts)
                 if save_path is not None:
-                    with open(save_path + 'epoch%d.prediction' % epoch, 'w') as file:
+                    with open(os.path.join(save_path, 'epoch%d.prediction' % epoch), 'w') as file:
                         json.dump(answers, file)
                 else:
                     answers = json.dumps(answers)
@@ -359,8 +359,8 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                     bestacc = acc
 
                     if save_path is not None:
-                        saver.save(sess, save_path + 'epoch%d.model' % epoch)
-                        with open(save_path + 'epoch%d.score' % epoch, 'wb') as file:
+                        saver.save(os.path.join(sess, save_path + 'epoch%d.model' % epoch))
+                        with open(os.path.join(save_path, 'epoch%d.score' % epoch), 'wb') as file:
                             pickle.dump(
                                 (position1, position2, ids, contexts), file)
                 logger.debug('epoch %d acc %g bestacc %g' %

diff --git a/examples/trials/weight_sharing/ga_squad/attention.py b/examples/trials/weight_sharing/ga_squad/attention.py
@@ -0,0 +1,171 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import math
+
+import tensorflow as tf
+from tensorflow.python.ops.rnn_cell_impl import RNNCell
+
+
+def _get_variable(variable_dict, name, shape, initializer=None, dtype=tf.float32):
+    if name not in variable_dict:
+        variable_dict[name] = tf.get_variable(
+            name=name, shape=shape, initializer=initializer, dtype=dtype)
+    return variable_dict[name]
+
+
+class DotAttention:
+    '''
+    DotAttention
+    '''
+
+    def __init__(self, name,
+                 hidden_dim,
+                 is_vanilla=True,
+                 is_identity_transform=False,
+                 need_padding=False):
+        self._name = '/'.join([name, 'dot_att'])
+        self._hidden_dim = hidden_dim
+        self._is_identity_transform = is_identity_transform
+        self._need_padding = need_padding
+        self._is_vanilla = is_vanilla
+        self._var = {}
+
+    @property
+    def is_identity_transform(self):
+        return self._is_identity_transform
+
+    @property
+    def is_vanilla(self):
+        return self._is_vanilla
+
+    @property
+    def need_padding(self):
+        return self._need_padding
+
+    @property
+    def hidden_dim(self):
+        return self._hidden_dim
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    def var(self):
+        return self._var
+
+    def _get_var(self, name, shape, initializer=None):
+        with tf.variable_scope(self.name):
+            return _get_variable(self.var, name, shape, initializer)
+
+    def _define_params(self, src_dim, tgt_dim):
+        hidden_dim = self.hidden_dim
+        self._get_var('W', [src_dim, hidden_dim])
+        if not self.is_vanilla:
+            self._get_var('V', [src_dim, hidden_dim])
+            if self.need_padding:
+                self._get_var('V_s', [src_dim, src_dim])
+                self._get_var('V_t', [tgt_dim, tgt_dim])
+            if not self.is_identity_transform:
+                self._get_var('T', [tgt_dim, src_dim])
+        self._get_var('U', [tgt_dim, hidden_dim])
+        self._get_var('b', [1, hidden_dim])
+        self._get_var('v', [hidden_dim, 1])
+
+    def get_pre_compute(self, s):
+        '''
+        :param s: [src_sequence, batch_size, src_dim]
+        :return: [src_sequence, batch_size. hidden_dim]
+        '''
+        hidden_dim = self.hidden_dim
+        src_dim = s.get_shape().as_list()[-1]
+        assert src_dim is not None, 'src dim must be defined'
+        W = self._get_var('W', shape=[src_dim, hidden_dim])
+        b = self._get_var('b', shape=[1, hidden_dim])
+        return tf.tensordot(s, W, [[2], [0]]) + b
+
+    def get_prob(self, src, tgt, mask, pre_compute, return_logits=False):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param h: [batch_size, tgt_dim] or [tgt_sequence_length, batch_size, tgt_dim]
+        :param mask: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_sizse]
+        :param pre_compute: [src_sequence_length, batch_size, hidden_dim]
+        :return: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_size]
+        '''
+        s_shape = src.get_shape().as_list()
+        h_shape = tgt.get_shape().as_list()
+        src_dim = s_shape[-1]
+        tgt_dim = h_shape[-1]
+        assert src_dim is not None, 'src dimension must be defined'
+        assert tgt_dim is not None, 'tgt dimension must be defined'
+
+        self._define_params(src_dim, tgt_dim)
+
+        if len(h_shape) == 2:
+            tgt = tf.expand_dims(tgt, 0)
+        if pre_compute is None:
+            pre_compute = self.get_pre_compute(src)
+
+        buf0 = pre_compute
+        buf1 = tf.tensordot(tgt, self.var['U'], axes=[[2], [0]])
+        buf2 = tf.tanh(tf.expand_dims(buf0, 0) + tf.expand_dims(buf1, 1))
+
+        if not self.is_vanilla:
+            xh1 = tgt
+            xh2 = tgt
+            s1 = src
+            if self.need_padding:
+                xh1 = tf.tensordot(xh1, self.var['V_t'], 1)
+                xh2 = tf.tensordot(xh2, self.var['S_t'], 1)
+                s1 = tf.tensordot(s1, self.var['V_s'], 1)
+            if not self.is_identity_transform:
+                xh1 = tf.tensordot(xh1, self.var['T'], 1)
+                xh2 = tf.tensordot(xh2, self.var['T'], 1)
+            buf3 = tf.expand_dims(s1, 0) * tf.expand_dims(xh1, 1)
+            buf3 = tf.tanh(tf.tensordot(buf3, self.var['V'], axes=[[3], [0]]))
+            buf = tf.reshape(tf.tanh(buf2 + buf3), shape=tf.shape(buf3))
+        else:
+            buf = buf2
+        v = self.var['v']
+        e = tf.tensordot(buf, v, [[3], [0]])
+        e = tf.squeeze(e, axis=[3])
+        tmp = tf.reshape(e + (mask - 1) * 10000.0, shape=tf.shape(e))
+        prob = tf.nn.softmax(tmp, 1)
+        if len(h_shape) == 2:
+            prob = tf.squeeze(prob, axis=[0])
+            tmp = tf.squeeze(tmp, axis=[0])
+        if return_logits:
+            return prob, tmp
+        return prob
+
+    def get_att(self, s, prob):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param prob: [src_sequence_length, batch_size]\
+            or [tgt_sequence_length, src_sequence_length, batch_size]
+        :return: [batch_size, src_dim] or [tgt_sequence_length, batch_size, src_dim]
+        '''
+        buf = s * tf.expand_dims(prob, axis=-1)
+        att = tf.reduce_sum(buf, axis=-3)
+        return att
diff --git a/examples/trials/weight_sharing/ga_squad/config_remote.yml b/examples/trials/weight_sharing/ga_squad/config_remote.yml
@@ -0,0 +1,31 @@
+authorName: default
+experimentName: ga_squad_weight_sharing
+trialConcurrency: 2
+maxExecDuration: 1h
+maxTrialNum: 200
+#choice: local, remote, pai
+trainingServicePlatform: remote
+#choice: true, false
+useAnnotation: false
+multiThread: true
+tuner:
+  codeDir: ../../../tuners/weight_sharing/ga_customer_tuner
+  classFileName: customer_tuner.py 
+  className: CustomerTuner
+  classArgs:
+    optimize_mode: maximize
+    population_size: 32
+    save_dir_root: /mnt/nfs/nni/ga_squad
+trial:
+  command: python3 trial.py --input_file /mnt/nfs/nni/train-v1.1.json --dev_file /mnt/nfs/nni/dev-v1.1.json --max_epoch 1 --embedding_file /mnt/nfs/nni/glove.6B.300d.txt
+  codeDir: .
+  gpuNum: 1
+machineList:
+  - ip: remote-ip-0
+    port: 8022
+    username: root 
+    passwd: screencast
+  - ip: remote-ip-1
+    port: 8022
+    username: root 
+    passwd: screencast