Weights returned as nan #4

ghost · 2017-01-05T00:06:16Z

Hi,
I have a simple Keras CNN working fine as it is. When trying to apply weightnorm, either with SGD or Adam, the first updated weights are all always return as NaN, triggering an error.
This is an example of one layer weights just before the model.fit():
{'W_constraint': None, 'b_constraint': None, 'name': 'dense_3', 'activity_regularizer': None, 'trainable': True, 'init': 'glorot_uniform', 'bias': True, 'activation': 'softmax', 'input_dim': None, 'b_regularizer': None, 'W_regularizer': None, 'output_dim': 8} [array([[-1.81446958, -0.74279195, -1.98372281, 1.03149867, -1.33605921, 0.98080444, 1.46184123, -1.90489924], [ 1.74007297, 1.18310583, 0.96353596, -0.49502602, -1.5556761 , 1.71657765, 0.94695097, 2.61784649], [-0.638098 , -1.65658796, 0.45535672, 1.39707041, -0.53299773, -1.73198462, 0.05106336, -0.93136811], [-0.50413573, -0.12023554, -1.1118933 , -1.12377524, 1.9663564 , 1.5819149 , -1.72357309, -0.63662446], [-1.6616931 , 1.57845461, -1.33607149, 1.03262866, 1.02465236, -1.82984507, -1.94427574, 2.13097382], [-0.69643229, -1.69655061, 1.86963248, 1.35395622, 1.43264794, -1.60058153, 1.45158744, 1.88503206], [-0.1455002 , 0.44617018, -0.47829607, -1.31520915, 1.82627797, 1.81214976, -0.27336141, 1.91040981], [-0.78067726, 1.90638936, -1.97633493, -1.061988 , 0.02862636, -0.37745535, 1.65916157, 0.70244253], [-0.21252237, -0.65053529, 0.51744008, 0.68950123, -1.85650849, 1.0682615 , 1.55790281, -0.83147609], [ 0.48371872, -0.85853142, -2.022681 , -1.08805192, 2.06113982, -0.57459891, -1.63607311, -0.83574378], [ 1.05208552, -1.69211721, -0.43760285, 1.03213108, -2.36395407, -1.02809763, -0.806862 , -1.45331335], [-1.12855673, 1.70107543, 1.35683572, -1.20369387, -0.18256012, 2.01939988, 1.03289509, 2.65198541], [ 0.51740509, -0.23014481, 1.95300198, -0.66845942, 0.53607529, -1.01613665, 1.18222928, -0.80191672], [ 0.39752519, 2.14175916, 1.48441279, -1.20377731, -1.87403321, -0.11191524, -1.76513219, 2.63831162], [-1.98938465, -1.2327646 , -0.83744407, -0.64946407, 0.58288223, 2.24985504, -0.09591354, 2.01949072], [-1.42328095, 2.07457638, -1.33132982, -2.08888173, 1.02181983, 1.24852037, 1.10853899, -1.0029546 ], [ 1.75405586, 0.09432141, -1.31112003, -0.0304644 , -1.5135988 , -1.49612296, 1.2762996 , 0.60811853], [-1.64439476, 1.7335813 , -0.80541438, 0.27505419, 0.37458628, 0.72816306, 1.52508533, 1.85929 ], [-0.053883 , -2.13568377, 0.55463415, 0.43602318, 1.61183143, 1.48652506, -2.10601187, -1.08352566], [-1.21685481, 0.41039792, -0.78186649, 1.60308003, 0.99902558, 1.60311925, 1.10065258, 0.0354073 ], [ 2.12806535, 2.14419603, 0.96948087, 0.08199508, -0.84324813, -1.50271273, 0.10528874, -0.873142 ], [-2.15096569, 1.23474431, 1.25909293, -0.44441026, -2.08873248, 0.21763401, -2.12321043, -1.31675696], [ 1.95354533, 1.73437381, 1.38008749, 1.28455055, -0.34766021, -2.20302415, 0.51172131, -1.0840373 ], [ 1.58691943, 1.4111464 , -2.16242433, 1.90826643, -1.84906268, -1.18959498, -1.83963597, -0.12747419], [-0.4401913 , 1.22723794, -1.53341997, 1.43126631, -0.95519918, 0.61142218, 1.61414647, -0.13954096], [-0.63068312, 1.03541517, 2.19619155, -0.71226257, 1.70391488, 2.243999 , 1.81045079, -1.39369321], [ 0.22400506, 0.17860785, -1.42312717, 0.74690318, 0.66468042, -1.62544048, 1.75782633, 1.03065538], [ 2.11632895, 2.12409687, 1.10879564, 1.02491808, -0.37185353, 0.13456514, -1.70119786, -0.14151937], [-0.58504152, 2.31315374, 0.15611638, 1.2988714 , 1.33584034, 0.29542622, -1.18843138, 0.54929841], [ 0.84831744, -2.25127149, -0.42340177, -0.99950933, -0.33759385, 0.73217863, -1.75246251, -0.20512277], [ 1.16061187, -1.81038654, -1.50839853, 1.90214121, -0.33019581, -1.18630064, -0.29908586, -1.13772762], [-0.85308987, -0.56074762, -0.22539173, -0.95188016, -0.25569537, 1.48671508, -0.4336201 , 2.44569182]], dtype=float32), array([-0.75807816, -0.68674487, -0.79544491, -0.73615742, -0.74876821, -0.73147482, -0.74654377, -0.72675341], dtype=float32)]
and these are the weights after 1 epoch:
[[ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] ..., [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan]]

It´s the same for all layers.
The data_based_init() works fine, by the way.
Any clue what could be happening?
I am using TF v12 with CUDA 8 and a GPU Geforce 1080

The text was updated successfully, but these errors were encountered:

hefeiwangyande · 2018-03-21T03:18:07Z

Hello, I have the same trouble with you. When using the WN is only the decomposition of W into g, v , the classification task is running normally (accurate rate is not high). But when setting g and b according to Parameter_init in the paper the phenomenon of NAN appeared.
The code is as follows：

def conv2d(x, num_filters, filter_size=[3,3], stride=[1,1], pad='SAME', nonlinearity=None, init_scale=1., ema=None, **kwargs):
    ''' convolutional layer '''
    with tf.variable_scope('conv2d'):
            # data based initialization of parameters
            V = tf.get_variable('V', filter_size+[int(x.get_shape()[-1]),num_filters], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
            V_norm = tf.nn.l2_normalize(V, [0,1,2])
            x_init = tf.nn.conv2d(x, V_norm, [1]+stride+[1], pad)
            m_init, v_init = tf.nn.moments(x_init, [0,1,2])
            scale_init = init_scale/tf.sqrt(v_init + 1e-8)
            g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
                                                             initializer=tf.constant_initializer(1.), trainable=True)
            b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
                                                            initializer=tf.constant_initializer(0.), trainable=True)
            g_u=tf.assign(g,g*scale_init)
            b_u=tf.assign_add(b,-m_init * scale_init)
            # with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
                # g = tf.get_variable('g',dtype=tf.float32, initializer=scale_init, trainable=True)
                # b = tf.get_variable('b', dtype=tf.float32, initializer=-m_init*scale_init, trainable=True)
            x_init = tf.reshape(g_u,[1,1,1,num_filters])*(x_init)+tf.reshape(b_u,[1,1,1,num_filters])
            x = tf.nn.l2_normalize(x_init, dim=[0, 1, 2])
            if nonlinearity is not None:
                x = nonlinearity(x)
            return x

`

harsh306 · 2018-06-15T23:35:20Z

https://github.com/harsh306/WeightNormalization

wkirgsn · 2018-09-19T12:02:25Z

same problem here with keras 2 (incorporating the pull request).
No data based init applied - using weight norm for a single layer GRU model.
LSTM is working fine - I guess it has something to do with the initialization of the weights.

ghost changed the title ~~Wegihts returned as nan~~ Weights returned as nan Jan 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights returned as nan #4

Weights returned as nan #4

ghost commented Jan 5, 2017

hefeiwangyande commented Mar 21, 2018

harsh306 commented Jun 15, 2018

wkirgsn commented Sep 19, 2018 •

edited

Loading

Weights returned as nan #4

Weights returned as nan #4

Comments

ghost commented Jan 5, 2017

hefeiwangyande commented Mar 21, 2018

harsh306 commented Jun 15, 2018

wkirgsn commented Sep 19, 2018 • edited Loading

wkirgsn commented Sep 19, 2018 •

edited

Loading