From b7c686f649f568b45c54b8e2bfc39a8450793108 Mon Sep 17 00:00:00 2001
From: xuehui <xuehui@microsoft.com>
Date: Sun, 7 Oct 2018 16:00:54 +0800
Subject: [PATCH] Update ga squad (#104)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme
---
 docs/WriteYourTrial.md             |   3 +
 examples/trials/ga_squad/README.md | 506 ++++++++++++++---------------
 2 files changed, 255 insertions(+), 254 deletions(-)

diff --git a/docs/WriteYourTrial.md b/docs/WriteYourTrial.md
index 18388aa9fd..328f273f11 100644
--- a/docs/WriteYourTrial.md
+++ b/docs/WriteYourTrial.md
@@ -121,3 +121,6 @@ In the yaml configure file, you need to set *useAnnotation* to true to enable NN
 ```
 useAnnotation: true
 ```
+
+## More Trial Example
+* [Automatic Model Architecture Search for Reading Comprehension.](../examples/trials/ga_squad/README.md)
\ No newline at end of file
diff --git a/examples/trials/ga_squad/README.md b/examples/trials/ga_squad/README.md
index ab8ba853f7..08024d07be 100644
--- a/examples/trials/ga_squad/README.md
+++ b/examples/trials/ga_squad/README.md
@@ -1,254 +1,252 @@
-# Automatic Model Architecture Search for Reading Comprehension
-This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task.
-
-## Search Space
-Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension.
-We conclude the search space as follow:
-
-1. IDENTITY (Effectively means keep training).
-2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
-3. REMOVE-RNN-LAYER
-4. INSERT-ATTENTION-LAYER(Inserts a attention layer.)
-5. REMOVE-ATTENTION-LAYER
-6. ADD-SKIP (Identity between random layers).
-7. REMOVE-SKIP (Removes random skip).
-
-![ga-squad-logo](./ga_squad.png)
-
-## New version
-Also we have another version which time cost is less and performance is better. We will release soon.
-
-# How to run this example?
-
-## Download data
-
-### Use downloading script to download data
-
-Execute the following command to download needed files
-using the downloading script:
-
-```
-chmod +x ./download.sh
-./download.sh
-```
-
-### Download manually
-
-1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
-
-```
-wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
-wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
-```
-
-2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
-
-```
-wget http://nlp.stanford.edu/data/glove.840B.300d.zip
-unzip glove.840B.300d.zip
-```
-
-## Update configuration
-Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration:
-
-```
-authorName: default
-experimentName: example_ga_squad
-trialConcurrency: 1
-maxExecDuration: 1h
-maxTrialNum: 1
-#choice: local, remote
-trainingServicePlatform: local
-#choice: true, false
-useAnnotation: false
-tuner:
-  codeDir: ~/nni/examples/tuners/ga_customer_tuner
-  classFileName: customer_tuner.py
-  className: CustomerTuner
-  classArgs:
-    optimize_mode: maximize
-trial:
-  command: python3 trial.py
-  codeDir: ~/nni/examples/trials/ga_squad
-  gpuNum: 0
-```
-
-In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
-
-`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1.
-
-## submit this job
-
-```
-nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
-```
-
-# Techinal details about the trial
-
-## How does it works
-The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
-
-### The trial
-
-The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
-
-* `attention.py` contains an implementaion for attention mechanism in Tensorflow.
-* `data.py` contains functions for data preprocessing.
-* `evaluate.py` contains the evaluation script.
-* `graph.py` contains the definition of the computation graph.
-* `rnn.py` contains an implementaion for GRU in Tensorflow.
-* `train_model.py` is a wrapper for the whole question answering model.
-
-Among those files, `trial.py` and `graph_to_tf.py` is special.
-
-`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code:
-
-```
-def graph_to_network(input1,
-                     input2,
-                     input1_lengths,
-                     input2_lengths,
-                     graph,
-                     dropout_rate,
-                     is_training,
-                     num_heads=1,
-                     rnn_units=256):
-    topology = graph.is_topology()
-    layers = dict()
-    layers_sequence_lengths = dict()
-    num_units = input1.get_shape().as_list()[-1]
-    layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
-        positional_encoding(input1, scale=False, zero_pad=False)
-    layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
-    layers[0] = dropout(layers[0], dropout_rate, is_training)
-    layers[1] = dropout(layers[1], dropout_rate, is_training)
-    layers_sequence_lengths[0] = input1_lengths
-    layers_sequence_lengths[1] = input2_lengths
-    for _, topo_i in enumerate(topology):
-        if topo_i == '|':
-            continue
-        if graph.layers[topo_i].graph_type == LayerType.input.value:
-            # ......
-        elif graph.layers[topo_i].graph_type == LayerType.attention.value:
-            # ......
-        # More layers to handle
-```
-
-As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
-
-```
-topology = graph.is_topology()
-```
-
-performs topological sorting on the internal graph representation, and the code inside the loop:
-
-```
-for _, topo_i in enumerate(topology):
-```
-
-performs actually conversion that maps each layer to a part in Tensorflow computation graph.
-
-### The tuner
-
-The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
-
-```
-class CustomerTuner(Tuner):
-    # ......
-
-    def generate_parameters(self, parameter_id):
-        """Returns a set of trial graph config, as a serializable object.
-        parameter_id : int
-        """
-        if len(self.population) <= 0:
-            logger.debug("the len of poplution lower than zero.")
-            raise Exception('The population is empty')
-        pos = -1
-        for i in range(len(self.population)):
-            if self.population[i].result == None:
-                pos = i
-                break
-        if pos != -1:
-            indiv = copy.deepcopy(self.population[pos])
-            self.population.pop(pos)
-            temp = json.loads(graph_dumps(indiv.config))
-        else:
-            random.shuffle(self.population)
-            if self.population[0].result > self.population[1].result:
-                self.population[0] = self.population[1]
-            indiv = copy.deepcopy(self.population[0])
-            self.population.pop(1)
-            indiv.mutation()
-            graph = indiv.config
-            temp =  json.loads(graph_dumps(graph))
-    
-    # ......
-```
-
-As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
-
-```
-            if self.population[0].result > self.population[1].result:
-                self.population[0] = self.population[1]
-            indiv = copy.deepcopy(self.population[0])
-```
-
-controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
-
-## Model configuration format
-
-Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
-
-```
-{
-    "max_layer_num": 50,
-    "layers": [
-        {
-            "input_size": 0,
-            "type": 3,
-            "output_size": 1,
-            "input": [],
-            "size": "x",
-            "output": [4, 5],
-            "is_delete": false
-        },
-        {
-            "input_size": 0,
-            "type": 3,
-            "output_size": 1,
-            "input": [],
-            "size": "y",
-            "output": [4, 5],
-            "is_delete": false
-        },
-        {
-            "input_size": 1,
-            "type": 4,
-            "output_size": 0,
-            "input": [6],
-            "size": "x",
-            "output": [],
-            "is_delete": false
-        },
-        {
-            "input_size": 1,
-            "type": 4,
-            "output_size": 0,
-            "input": [5],
-            "size": "y",
-            "output": [],
-            "is_delete": false
-        },
-        {"Comment": "More layers will be here for actual graphs."}
-    ]
-}
-```
-
-Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
-
- * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively.
- * `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively.
- * `input_size` is the number of inputs the layer has.
- * `input` is the indices of layers taken as input of this layer.
- * `output` is the indices of layers use this layer's output as their input.
- * `is_delete` means whether the layer is still available.
\ No newline at end of file
+# Automatic Model Architecture Search for Reading Comprehension
+This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task.
+
+## Search Space
+Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension.
+We conclude the search space as follow:
+
+1. IDENTITY (Effectively means keep training).
+2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
+3. REMOVE-RNN-LAYER
+4. INSERT-ATTENTION-LAYER(Inserts a attention layer.)
+5. REMOVE-ATTENTION-LAYER
+6. ADD-SKIP (Identity between random layers).
+7. REMOVE-SKIP (Removes random skip).
+
+![ga-squad-logo](./ga_squad.png)
+
+## New version
+Also we have another version which time cost is less and performance is better. We will release soon.
+
+# How to run this example?
+
+## Use downloading script to download data
+
+Execute the following command to download needed files
+using the downloading script:
+
+```
+chmod +x ./download.sh
+./download.sh
+```
+
+## Download manually
+
+1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
+
+```
+wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
+wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
+```
+
+2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
+
+```
+wget http://nlp.stanford.edu/data/glove.840B.300d.zip
+unzip glove.840B.300d.zip
+```
+
+## Update configuration
+Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration:
+
+```
+authorName: default
+experimentName: example_ga_squad
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 1
+#choice: local, remote
+trainingServicePlatform: local
+#choice: true, false
+useAnnotation: false
+tuner:
+  codeDir: ~/nni/examples/tuners/ga_customer_tuner
+  classFileName: customer_tuner.py
+  className: CustomerTuner
+  classArgs:
+    optimize_mode: maximize
+trial:
+  command: python3 trial.py
+  codeDir: ~/nni/examples/trials/ga_squad
+  gpuNum: 0
+```
+
+In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
+
+`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1.
+
+## submit this job
+
+```
+nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
+```
+
+# Techinal details about the trial
+
+## How does it works
+The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
+
+### The trial
+
+The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+
+* `attention.py` contains an implementaion for attention mechanism in Tensorflow.
+* `data.py` contains functions for data preprocessing.
+* `evaluate.py` contains the evaluation script.
+* `graph.py` contains the definition of the computation graph.
+* `rnn.py` contains an implementaion for GRU in Tensorflow.
+* `train_model.py` is a wrapper for the whole question answering model.
+
+Among those files, `trial.py` and `graph_to_tf.py` is special.
+
+`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code:
+
+```
+def graph_to_network(input1,
+                     input2,
+                     input1_lengths,
+                     input2_lengths,
+                     graph,
+                     dropout_rate,
+                     is_training,
+                     num_heads=1,
+                     rnn_units=256):
+    topology = graph.is_topology()
+    layers = dict()
+    layers_sequence_lengths = dict()
+    num_units = input1.get_shape().as_list()[-1]
+    layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
+        positional_encoding(input1, scale=False, zero_pad=False)
+    layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
+    layers[0] = dropout(layers[0], dropout_rate, is_training)
+    layers[1] = dropout(layers[1], dropout_rate, is_training)
+    layers_sequence_lengths[0] = input1_lengths
+    layers_sequence_lengths[1] = input2_lengths
+    for _, topo_i in enumerate(topology):
+        if topo_i == '|':
+            continue
+        if graph.layers[topo_i].graph_type == LayerType.input.value:
+            # ......
+        elif graph.layers[topo_i].graph_type == LayerType.attention.value:
+            # ......
+        # More layers to handle
+```
+
+As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
+
+```
+topology = graph.is_topology()
+```
+
+performs topological sorting on the internal graph representation, and the code inside the loop:
+
+```
+for _, topo_i in enumerate(topology):
+```
+
+performs actually conversion that maps each layer to a part in Tensorflow computation graph.
+
+### The tuner
+
+The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
+
+```
+class CustomerTuner(Tuner):
+    # ......
+
+    def generate_parameters(self, parameter_id):
+        """Returns a set of trial graph config, as a serializable object.
+        parameter_id : int
+        """
+        if len(self.population) <= 0:
+            logger.debug("the len of poplution lower than zero.")
+            raise Exception('The population is empty')
+        pos = -1
+        for i in range(len(self.population)):
+            if self.population[i].result == None:
+                pos = i
+                break
+        if pos != -1:
+            indiv = copy.deepcopy(self.population[pos])
+            self.population.pop(pos)
+            temp = json.loads(graph_dumps(indiv.config))
+        else:
+            random.shuffle(self.population)
+            if self.population[0].result > self.population[1].result:
+                self.population[0] = self.population[1]
+            indiv = copy.deepcopy(self.population[0])
+            self.population.pop(1)
+            indiv.mutation()
+            graph = indiv.config
+            temp =  json.loads(graph_dumps(graph))
+    
+    # ......
+```
+
+As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
+
+```
+            if self.population[0].result > self.population[1].result:
+                self.population[0] = self.population[1]
+            indiv = copy.deepcopy(self.population[0])
+```
+
+controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
+
+## Model configuration format
+
+Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
+
+```
+{
+    "max_layer_num": 50,
+    "layers": [
+        {
+            "input_size": 0,
+            "type": 3,
+            "output_size": 1,
+            "input": [],
+            "size": "x",
+            "output": [4, 5],
+            "is_delete": false
+        },
+        {
+            "input_size": 0,
+            "type": 3,
+            "output_size": 1,
+            "input": [],
+            "size": "y",
+            "output": [4, 5],
+            "is_delete": false
+        },
+        {
+            "input_size": 1,
+            "type": 4,
+            "output_size": 0,
+            "input": [6],
+            "size": "x",
+            "output": [],
+            "is_delete": false
+        },
+        {
+            "input_size": 1,
+            "type": 4,
+            "output_size": 0,
+            "input": [5],
+            "size": "y",
+            "output": [],
+            "is_delete": false
+        },
+        {"Comment": "More layers will be here for actual graphs."}
+    ]
+}
+```
+
+Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
+
+ * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively.
+ * `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively.
+ * `input_size` is the number of inputs the layer has.
+ * `input` is the indices of layers taken as input of this layer.
+ * `output` is the indices of layers use this layer's output as their input.
+ * `is_delete` means whether the layer is still available.