Training container for NAS Envelopenet #429

garganubhav · 2019-03-12T17:56:12Z

This PR has all files for the training and evaluation required in NAS Envelopenet Suggestion service.

This change is

Akado2009 · 2019-03-12T23:11:33Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

@@ -0,0 +1,238 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function


You probably don't need future imports since you are using python3 anyways. Same goes for every import like this.

Akado2009 · 2019-03-12T23:13:43Z

examples/NAS-Envelopenet-trainingcontainer/cell_init.py

+
+    def cell(self, inputs, arch, is_training):
+        """Create the cell by instantiating the cell blocks"""
+        nscope = 'Cell_' + self.cellname + '_' + str(self.cellidx)


.format should be used

Akado2009 · 2019-03-12T23:14:26Z

examples/NAS-Envelopenet-trainingcontainer/cell_main.py

+          By default use stride=1 and SAME padding
+        """
+        dropout_keep_prob = 0.8
+        nscope = 'Cell_' + self.cellname + '_' + str(self.cellidx)


.format (will be no need to use str())

Akado2009 · 2019-03-12T23:14:36Z

examples/NAS-Envelopenet-trainingcontainer/cell_main.py

+                    net = tf.Print(
+                        net,
+                        [msss],
+                        message="MeanSSS=:" +


Akado2009 · 2019-03-12T23:16:16Z

examples/NAS-Envelopenet-trainingcontainer/evaluate.py

+        self.batch_size = self.task_config["batch_size"]
+        self.num_examples = 10000
+        self.run_once = True
+        self.eval_dir = self.task_config["data_dir"] + "/results/" + \


os.path.join() instead to take care of "/"

Akado2009 · 2019-03-12T23:16:56Z

examples/NAS-Envelopenet-trainingcontainer/evaluate.py

+
+            # Build a Graph that computes the logits predictions from the
+            # inference model.
+            # TODO: Clean up all args


Get rid of TODOs since I can't see a WIP status.

Akado2009 · 2019-03-12T23:17:59Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+        self.get_params(params)  
+
+    def get_params(self, params):
+        global global_batch_size


Should they be global and not class attributes?

Akado2009 · 2019-03-12T23:19:01Z

examples/NAS-Envelopenet-trainingcontainer/net.py

+            filepath = os.path.join(dest_directory, filename)
+            if not os.path.exists(filepath):
+                def _progress(count, block_size, total_size):
+                    sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,


Do you really need a progressbar?

Akado2009 · 2019-03-12T23:21:23Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+        self._step = self.global_step_init
+        self._start_time = time.time()
+
+    def before_run(self, run_context):


run_context is never used. function is actually never called

Akado2009 · 2019-03-12T23:23:23Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+from __future__ import print_function
+
+from datetime import datetime
+import ast


Is it used anywhere?

Akado2009 · 2019-03-12T23:23:49Z

examples/NAS-Envelopenet-trainingcontainer/cell_init.py

+        with tf.variable_scope(nscope, 'initial_block', [inputs], reuse=reuse) as scope:
+            with slim.arg_scope([slim.conv2d, slim.max_pool2d], stride=1, padding='SAME'):
+                net = inputs
+                layeridx = 0


Do you need this? As fas as I can see you only increment it.

Akado2009 · 2019-03-12T23:24:38Z

examples/NAS-Envelopenet-trainingcontainer/cell_main.py

+            outputs):
+        self.cellidx = cellidx
+        self.log_stats = log_stats
+        self.res=sys.argv[2]


Akado2009 · 2019-03-12T23:24:51Z

examples/NAS-Envelopenet-trainingcontainer/cell_main.py

+        self.log_stats = log_stats
+        self.res=sys.argv[2]
+        self.cellname = "Envelope"
+        self.numbranches = 4


Akado2009 · 2019-03-12T23:25:12Z

examples/NAS-Envelopenet-trainingcontainer/cell_main.py

+        self.numbranches = 4
+        self.numbins = 100
+        self.batchsize = int(net.shape[0])
+        numfilters = len(filters)


Akado2009 · 2019-03-12T23:26:59Z

examples/NAS-Envelopenet-trainingcontainer/net.py

+            scope)
+        return softmax_linear
+
+    def maybe_download_and_extract(self):


Somehow used? I see the acutla line calling it is commented out.

Akado2009 · 2019-03-12T23:27:16Z

examples/NAS-Envelopenet-trainingcontainer/net.py

+                is_training=True,
+                scope='Nacnet'):
+        net = self.add_init(inputs, initcell, is_training)
+        end_points = {}


? not used? since the line using it is commented out

Akado2009 · 2019-03-12T23:27:38Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+        self.global_step_init = global_step_init
+        self.loss = loss
+
+    def begin(self):


Akado2009 · 2019-03-12T23:27:59Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+        # Asks for loss value.
+        return tf.train.SessionRunArgs(self.loss)
+
+    def after_run(self, run_context, run_values):


function is never called

these are required in loggerhook call during training

Akado2009 · 2019-03-12T23:28:10Z

examples/NAS-Envelopenet-trainingcontainer/model_constructor.py

+                    mon_sess.run(train_op)
+
+    def evaluate():
+        eval=Evaluate(self.arch, self.params, self.train_dir)


never used?

it's called from run_trail.py

garganubhav · 2019-04-24T02:11:25Z

@richardsliu @Akado2009 plz merge

gaocegege · 2019-04-24T02:18:17Z

Is the suggestion trained for one experiment (or search space)?

Or is it trained once, and can be used for all experiments?

garganubhav · 2019-04-24T17:40:13Z

@gaocegege It is trained for every experiment. As there is no controller like in RL, the only training required is of the architectures being sampled, and that too is truncated,i.e., 10 epochs.

gaocegege · 2019-04-25T01:36:19Z

Now our suggestion is long-running, thus I am not sure if we could support the case. But the suggestion itself LGTM. Thanks for your contribution

richardsliu

Can you add a README file that explains:

How this code works
How to run/debug locally, if possible
Links to the design doc and/or paper

Without prior knowledge of NAS, it is hard to follow.

Also do you have unit tests?

garganubhav · 2019-05-01T21:22:32Z

@gaocegege In fact it's the fastest NAS Algo, even faster than RL algo which is there itself in katib. If you change the 'steps' parameter to 10 in yaml file, it will complete in ~15 min. Also, you might want to run it on GPU.

garganubhav · 2019-05-01T21:37:00Z

@richardsliu done

garganubhav · 2019-05-08T21:05:59Z

@richardsliu Plz merge

richardsliu · 2019-05-08T21:06:54Z

/lgtm
/approve

k8s-ci-robot · 2019-05-08T21:07:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardsliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [richardsliu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

garganubhav · 2019-05-08T22:24:02Z

/retest

Training container for NAS Envelopenet

4eab9f2

k8s-ci-robot requested review from ddysher and libbyandhelen March 12, 2019 17:56

k8s-ci-robot added the size/XXL label Mar 12, 2019

Akado2009 reviewed Mar 12, 2019

View reviewed changes

Changes after review

98b9cb2

richardsliu self-assigned this Apr 25, 2019

richardsliu reviewed Apr 25, 2019

View reviewed changes

README added

a56c30a

k8s-ci-robot added the lgtm label May 8, 2019

k8s-ci-robot added the approved label May 8, 2019

k8s-ci-robot merged commit b76c9c1 into kubeflow:master May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training container for NAS Envelopenet #429

Training container for NAS Envelopenet #429

garganubhav commented Mar 12, 2019 •

edited by jlewi

Loading

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019 •

edited

Loading

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

Akado2009 Mar 12, 2019

garganubhav Apr 24, 2019

Akado2009 Mar 12, 2019

garganubhav Apr 24, 2019

garganubhav commented Apr 24, 2019

gaocegege commented Apr 24, 2019

garganubhav commented Apr 24, 2019

gaocegege commented Apr 25, 2019

richardsliu left a comment

garganubhav commented May 1, 2019

garganubhav commented May 1, 2019

garganubhav commented May 8, 2019

richardsliu commented May 8, 2019

k8s-ci-robot commented May 8, 2019

garganubhav commented May 8, 2019

Training container for NAS Envelopenet #429

Training container for NAS Envelopenet #429

Conversation

garganubhav commented Mar 12, 2019 • edited by jlewi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akado2009 Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garganubhav commented Apr 24, 2019

gaocegege commented Apr 24, 2019

garganubhav commented Apr 24, 2019

gaocegege commented Apr 25, 2019

richardsliu left a comment

Choose a reason for hiding this comment

garganubhav commented May 1, 2019

garganubhav commented May 1, 2019

garganubhav commented May 8, 2019

richardsliu commented May 8, 2019

k8s-ci-robot commented May 8, 2019

garganubhav commented May 8, 2019

garganubhav commented Mar 12, 2019 •

edited by jlewi

Loading

Akado2009 Mar 12, 2019 •

edited

Loading