From b7c686f649f568b45c54b8e2bfc39a8450793108 Mon Sep 17 00:00:00 2001 From: xuehui Date: Sun, 7 Oct 2018 16:00:54 +0800 Subject: [PATCH] Update ga squad (#104) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme --- docs/WriteYourTrial.md | 3 + examples/trials/ga_squad/README.md | 506 ++++++++++++++--------------- 2 files changed, 255 insertions(+), 254 deletions(-) diff --git a/docs/WriteYourTrial.md b/docs/WriteYourTrial.md index 18388aa9fd..328f273f11 100644 --- a/docs/WriteYourTrial.md +++ b/docs/WriteYourTrial.md @@ -121,3 +121,6 @@ In the yaml configure file, you need to set *useAnnotation* to true to enable NN ``` useAnnotation: true ``` + +## More Trial Example +* [Automatic Model Architecture Search for Reading Comprehension.](../examples/trials/ga_squad/README.md) \ No newline at end of file diff --git a/examples/trials/ga_squad/README.md b/examples/trials/ga_squad/README.md index ab8ba853f7..08024d07be 100644 --- a/examples/trials/ga_squad/README.md +++ b/examples/trials/ga_squad/README.md @@ -1,254 +1,252 @@ -# Automatic Model Architecture Search for Reading Comprehension -This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task. - -## Search Space -Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension. -We conclude the search space as follow: - -1. IDENTITY (Effectively means keep training). -2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.) -3. REMOVE-RNN-LAYER -4. INSERT-ATTENTION-LAYER(Inserts a attention layer.) -5. REMOVE-ATTENTION-LAYER -6. ADD-SKIP (Identity between random layers). -7. REMOVE-SKIP (Removes random skip). - -![ga-squad-logo](./ga_squad.png) - -## New version -Also we have another version which time cost is less and performance is better. We will release soon. - -# How to run this example? - -## Download data - -### Use downloading script to download data - -Execute the following command to download needed files -using the downloading script: - -``` -chmod +x ./download.sh -./download.sh -``` - -### Download manually - -1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/ - -``` -wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -``` - -2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/ - -``` -wget http://nlp.stanford.edu/data/glove.840B.300d.zip -unzip glove.840B.300d.zip -``` - -## Update configuration -Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration: - -``` -authorName: default -experimentName: example_ga_squad -trialConcurrency: 1 -maxExecDuration: 1h -maxTrialNum: 1 -#choice: local, remote -trainingServicePlatform: local -#choice: true, false -useAnnotation: false -tuner: - codeDir: ~/nni/examples/tuners/ga_customer_tuner - classFileName: customer_tuner.py - className: CustomerTuner - classArgs: - optimize_mode: maximize -trial: - command: python3 trial.py - codeDir: ~/nni/examples/trials/ga_squad - gpuNum: 0 -``` - -In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result. - -`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1. - -## submit this job - -``` -nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml -``` - -# Techinal details about the trial - -## How does it works -The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. - -### The trial - -The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction: - -* `attention.py` contains an implementaion for attention mechanism in Tensorflow. -* `data.py` contains functions for data preprocessing. -* `evaluate.py` contains the evaluation script. -* `graph.py` contains the definition of the computation graph. -* `rnn.py` contains an implementaion for GRU in Tensorflow. -* `train_model.py` is a wrapper for the whole question answering model. - -Among those files, `trial.py` and `graph_to_tf.py` is special. - -`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code: - -``` -def graph_to_network(input1, - input2, - input1_lengths, - input2_lengths, - graph, - dropout_rate, - is_training, - num_heads=1, - rnn_units=256): - topology = graph.is_topology() - layers = dict() - layers_sequence_lengths = dict() - num_units = input1.get_shape().as_list()[-1] - layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \ - positional_encoding(input1, scale=False, zero_pad=False) - layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32)) - layers[0] = dropout(layers[0], dropout_rate, is_training) - layers[1] = dropout(layers[1], dropout_rate, is_training) - layers_sequence_lengths[0] = input1_lengths - layers_sequence_lengths[1] = input2_lengths - for _, topo_i in enumerate(topology): - if topo_i == '|': - continue - if graph.layers[topo_i].graph_type == LayerType.input.value: - # ...... - elif graph.layers[topo_i].graph_type == LayerType.attention.value: - # ...... - # More layers to handle -``` - -As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph. - -``` -topology = graph.is_topology() -``` - -performs topological sorting on the internal graph representation, and the code inside the loop: - -``` -for _, topo_i in enumerate(topology): -``` - -performs actually conversion that maps each layer to a part in Tensorflow computation graph. - -### The tuner - -The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`: - -``` -class CustomerTuner(Tuner): - # ...... - - def generate_parameters(self, parameter_id): - """Returns a set of trial graph config, as a serializable object. - parameter_id : int - """ - if len(self.population) <= 0: - logger.debug("the len of poplution lower than zero.") - raise Exception('The population is empty') - pos = -1 - for i in range(len(self.population)): - if self.population[i].result == None: - pos = i - break - if pos != -1: - indiv = copy.deepcopy(self.population[pos]) - self.population.pop(pos) - temp = json.loads(graph_dumps(indiv.config)) - else: - random.shuffle(self.population) - if self.population[0].result > self.population[1].result: - self.population[0] = self.population[1] - indiv = copy.deepcopy(self.population[0]) - self.population.pop(1) - indiv.mutation() - graph = indiv.config - temp = json.loads(graph_dumps(graph)) - - # ...... -``` - -As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines: - -``` - if self.population[0].result > self.population[1].result: - self.population[0] = self.population[1] - indiv = copy.deepcopy(self.population[0]) -``` - -controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result. - -## Model configuration format - -Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. - -``` -{ - "max_layer_num": 50, - "layers": [ - { - "input_size": 0, - "type": 3, - "output_size": 1, - "input": [], - "size": "x", - "output": [4, 5], - "is_delete": false - }, - { - "input_size": 0, - "type": 3, - "output_size": 1, - "input": [], - "size": "y", - "output": [4, 5], - "is_delete": false - }, - { - "input_size": 1, - "type": 4, - "output_size": 0, - "input": [6], - "size": "x", - "output": [], - "is_delete": false - }, - { - "input_size": 1, - "type": 4, - "output_size": 0, - "input": [5], - "size": "y", - "output": [], - "is_delete": false - }, - {"Comment": "More layers will be here for actual graphs."} - ] -} -``` - -Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where: - - * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively. - * `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively. - * `input_size` is the number of inputs the layer has. - * `input` is the indices of layers taken as input of this layer. - * `output` is the indices of layers use this layer's output as their input. - * `is_delete` means whether the layer is still available. \ No newline at end of file +# Automatic Model Architecture Search for Reading Comprehension +This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task. + +## Search Space +Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension. +We conclude the search space as follow: + +1. IDENTITY (Effectively means keep training). +2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.) +3. REMOVE-RNN-LAYER +4. INSERT-ATTENTION-LAYER(Inserts a attention layer.) +5. REMOVE-ATTENTION-LAYER +6. ADD-SKIP (Identity between random layers). +7. REMOVE-SKIP (Removes random skip). + +![ga-squad-logo](./ga_squad.png) + +## New version +Also we have another version which time cost is less and performance is better. We will release soon. + +# How to run this example? + +## Use downloading script to download data + +Execute the following command to download needed files +using the downloading script: + +``` +chmod +x ./download.sh +./download.sh +``` + +## Download manually + +1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/ + +``` +wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json +wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json +``` + +2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/ + +``` +wget http://nlp.stanford.edu/data/glove.840B.300d.zip +unzip glove.840B.300d.zip +``` + +## Update configuration +Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration: + +``` +authorName: default +experimentName: example_ga_squad +trialConcurrency: 1 +maxExecDuration: 1h +maxTrialNum: 1 +#choice: local, remote +trainingServicePlatform: local +#choice: true, false +useAnnotation: false +tuner: + codeDir: ~/nni/examples/tuners/ga_customer_tuner + classFileName: customer_tuner.py + className: CustomerTuner + classArgs: + optimize_mode: maximize +trial: + command: python3 trial.py + codeDir: ~/nni/examples/trials/ga_squad + gpuNum: 0 +``` + +In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result. + +`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1. + +## submit this job + +``` +nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml +``` + +# Techinal details about the trial + +## How does it works +The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. + +### The trial + +The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction: + +* `attention.py` contains an implementaion for attention mechanism in Tensorflow. +* `data.py` contains functions for data preprocessing. +* `evaluate.py` contains the evaluation script. +* `graph.py` contains the definition of the computation graph. +* `rnn.py` contains an implementaion for GRU in Tensorflow. +* `train_model.py` is a wrapper for the whole question answering model. + +Among those files, `trial.py` and `graph_to_tf.py` is special. + +`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code: + +``` +def graph_to_network(input1, + input2, + input1_lengths, + input2_lengths, + graph, + dropout_rate, + is_training, + num_heads=1, + rnn_units=256): + topology = graph.is_topology() + layers = dict() + layers_sequence_lengths = dict() + num_units = input1.get_shape().as_list()[-1] + layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \ + positional_encoding(input1, scale=False, zero_pad=False) + layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32)) + layers[0] = dropout(layers[0], dropout_rate, is_training) + layers[1] = dropout(layers[1], dropout_rate, is_training) + layers_sequence_lengths[0] = input1_lengths + layers_sequence_lengths[1] = input2_lengths + for _, topo_i in enumerate(topology): + if topo_i == '|': + continue + if graph.layers[topo_i].graph_type == LayerType.input.value: + # ...... + elif graph.layers[topo_i].graph_type == LayerType.attention.value: + # ...... + # More layers to handle +``` + +As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph. + +``` +topology = graph.is_topology() +``` + +performs topological sorting on the internal graph representation, and the code inside the loop: + +``` +for _, topo_i in enumerate(topology): +``` + +performs actually conversion that maps each layer to a part in Tensorflow computation graph. + +### The tuner + +The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`: + +``` +class CustomerTuner(Tuner): + # ...... + + def generate_parameters(self, parameter_id): + """Returns a set of trial graph config, as a serializable object. + parameter_id : int + """ + if len(self.population) <= 0: + logger.debug("the len of poplution lower than zero.") + raise Exception('The population is empty') + pos = -1 + for i in range(len(self.population)): + if self.population[i].result == None: + pos = i + break + if pos != -1: + indiv = copy.deepcopy(self.population[pos]) + self.population.pop(pos) + temp = json.loads(graph_dumps(indiv.config)) + else: + random.shuffle(self.population) + if self.population[0].result > self.population[1].result: + self.population[0] = self.population[1] + indiv = copy.deepcopy(self.population[0]) + self.population.pop(1) + indiv.mutation() + graph = indiv.config + temp = json.loads(graph_dumps(graph)) + + # ...... +``` + +As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines: + +``` + if self.population[0].result > self.population[1].result: + self.population[0] = self.population[1] + indiv = copy.deepcopy(self.population[0]) +``` + +controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result. + +## Model configuration format + +Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. + +``` +{ + "max_layer_num": 50, + "layers": [ + { + "input_size": 0, + "type": 3, + "output_size": 1, + "input": [], + "size": "x", + "output": [4, 5], + "is_delete": false + }, + { + "input_size": 0, + "type": 3, + "output_size": 1, + "input": [], + "size": "y", + "output": [4, 5], + "is_delete": false + }, + { + "input_size": 1, + "type": 4, + "output_size": 0, + "input": [6], + "size": "x", + "output": [], + "is_delete": false + }, + { + "input_size": 1, + "type": 4, + "output_size": 0, + "input": [5], + "size": "y", + "output": [], + "is_delete": false + }, + {"Comment": "More layers will be here for actual graphs."} + ] +} +``` + +Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where: + + * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively. + * `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively. + * `input_size` is the number of inputs the layer has. + * `input` is the indices of layers taken as input of this layer. + * `output` is the indices of layers use this layer's output as their input. + * `is_delete` means whether the layer is still available.