The output of conv2d should be updated after g and b are updated in data dependent initialization. #36

bfs18 · 2018-06-18T17:28:25Z

The initial values for g and b are used to keep the pre-activation values normally-distributed. After the tf.assign operation for g and b, the output of the current conv2d layer is changed, so the input to the next layer is changed. I think the initialization of g and b for the next layer should depends on the new conv2d output.
So I think the customized conv2d in nn.py should be modified as the following

def conv2d(x_, num_filters, filter_size=[3, 3], stride=[1, 1], pad='SAME', nonlinearity=None, init_scale=1., counters={},
           init=False, ema=None, **kwargs):
    ''' convolutional layer '''
    name = get_name('conv2d', counters)
    with tf.variable_scope(name):
        V = get_var_maybe_avg('V', ema, shape=filter_size + [int(x.get_shape()[-1]), num_filters], dtype=tf.float32,
                              initializer=tf.random_normal_initializer(0, 0.05), trainable=True)
        g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
                              initializer=tf.constant_initializer(1.), trainable=True)
        b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
                              initializer=tf.constant_initializer(0.), trainable=True)

        # use weight normalization (Salimans & Kingma, 2016)
        W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])

        # calculate convolutional layer output
        x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)

        if init:  # normalize x
            m_init, v_init = tf.nn.moments(x, [0, 1, 2])
            scale_init = init_scale / tf.sqrt(v_init + 1e-10)
            with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
                # x = tf.identity(x)
                W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
                x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)

        # apply nonlinearity
        if nonlinearity is not None:
            x = nonlinearity(x)

        return x

The text was updated successfully, but these errors were encountered:

harsh306 · 2018-06-18T17:57:57Z

I have a simple question, what is the idea of having different weight normalization "flows" when init=True vs Init= False?

bfs18 · 2018-06-19T02:06:57Z

@harsh306 This is the data dependent initialization for g and b. You can find the details in the Weight Normalization paper.

bfs18 mentioned this issue Jun 19, 2018

fix the data dependent initializaiton. #37

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The output of conv2d should be updated after g and b are updated in data dependent initialization. #36

The output of conv2d should be updated after g and b are updated in data dependent initialization. #36

bfs18 commented Jun 18, 2018 •

edited

Loading

harsh306 commented Jun 18, 2018

bfs18 commented Jun 19, 2018

The output of conv2d should be updated after g and b are updated in data dependent initialization. #36

The output of conv2d should be updated after g and b are updated in data dependent initialization. #36

Comments

bfs18 commented Jun 18, 2018 • edited Loading

harsh306 commented Jun 18, 2018

bfs18 commented Jun 19, 2018

bfs18 commented Jun 18, 2018 •

edited

Loading