Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights returned as nan #4

Open
ghost opened this issue Jan 5, 2017 · 3 comments
Open

Weights returned as nan #4

ghost opened this issue Jan 5, 2017 · 3 comments

Comments

@ghost
Copy link

ghost commented Jan 5, 2017

Hi,
I have a simple Keras CNN working fine as it is. When trying to apply weightnorm, either with SGD or Adam, the first updated weights are all always return as NaN, triggering an error.
This is an example of one layer weights just before the model.fit():
{'W_constraint': None, 'b_constraint': None, 'name': 'dense_3', 'activity_regularizer': None, 'trainable': True, 'init': 'glorot_uniform', 'bias': True, 'activation': 'softmax', 'input_dim': None, 'b_regularizer': None, 'W_regularizer': None, 'output_dim': 8} [array([[-1.81446958, -0.74279195, -1.98372281, 1.03149867, -1.33605921, 0.98080444, 1.46184123, -1.90489924], [ 1.74007297, 1.18310583, 0.96353596, -0.49502602, -1.5556761 , 1.71657765, 0.94695097, 2.61784649], [-0.638098 , -1.65658796, 0.45535672, 1.39707041, -0.53299773, -1.73198462, 0.05106336, -0.93136811], [-0.50413573, -0.12023554, -1.1118933 , -1.12377524, 1.9663564 , 1.5819149 , -1.72357309, -0.63662446], [-1.6616931 , 1.57845461, -1.33607149, 1.03262866, 1.02465236, -1.82984507, -1.94427574, 2.13097382], [-0.69643229, -1.69655061, 1.86963248, 1.35395622, 1.43264794, -1.60058153, 1.45158744, 1.88503206], [-0.1455002 , 0.44617018, -0.47829607, -1.31520915, 1.82627797, 1.81214976, -0.27336141, 1.91040981], [-0.78067726, 1.90638936, -1.97633493, -1.061988 , 0.02862636, -0.37745535, 1.65916157, 0.70244253], [-0.21252237, -0.65053529, 0.51744008, 0.68950123, -1.85650849, 1.0682615 , 1.55790281, -0.83147609], [ 0.48371872, -0.85853142, -2.022681 , -1.08805192, 2.06113982, -0.57459891, -1.63607311, -0.83574378], [ 1.05208552, -1.69211721, -0.43760285, 1.03213108, -2.36395407, -1.02809763, -0.806862 , -1.45331335], [-1.12855673, 1.70107543, 1.35683572, -1.20369387, -0.18256012, 2.01939988, 1.03289509, 2.65198541], [ 0.51740509, -0.23014481, 1.95300198, -0.66845942, 0.53607529, -1.01613665, 1.18222928, -0.80191672], [ 0.39752519, 2.14175916, 1.48441279, -1.20377731, -1.87403321, -0.11191524, -1.76513219, 2.63831162], [-1.98938465, -1.2327646 , -0.83744407, -0.64946407, 0.58288223, 2.24985504, -0.09591354, 2.01949072], [-1.42328095, 2.07457638, -1.33132982, -2.08888173, 1.02181983, 1.24852037, 1.10853899, -1.0029546 ], [ 1.75405586, 0.09432141, -1.31112003, -0.0304644 , -1.5135988 , -1.49612296, 1.2762996 , 0.60811853], [-1.64439476, 1.7335813 , -0.80541438, 0.27505419, 0.37458628, 0.72816306, 1.52508533, 1.85929 ], [-0.053883 , -2.13568377, 0.55463415, 0.43602318, 1.61183143, 1.48652506, -2.10601187, -1.08352566], [-1.21685481, 0.41039792, -0.78186649, 1.60308003, 0.99902558, 1.60311925, 1.10065258, 0.0354073 ], [ 2.12806535, 2.14419603, 0.96948087, 0.08199508, -0.84324813, -1.50271273, 0.10528874, -0.873142 ], [-2.15096569, 1.23474431, 1.25909293, -0.44441026, -2.08873248, 0.21763401, -2.12321043, -1.31675696], [ 1.95354533, 1.73437381, 1.38008749, 1.28455055, -0.34766021, -2.20302415, 0.51172131, -1.0840373 ], [ 1.58691943, 1.4111464 , -2.16242433, 1.90826643, -1.84906268, -1.18959498, -1.83963597, -0.12747419], [-0.4401913 , 1.22723794, -1.53341997, 1.43126631, -0.95519918, 0.61142218, 1.61414647, -0.13954096], [-0.63068312, 1.03541517, 2.19619155, -0.71226257, 1.70391488, 2.243999 , 1.81045079, -1.39369321], [ 0.22400506, 0.17860785, -1.42312717, 0.74690318, 0.66468042, -1.62544048, 1.75782633, 1.03065538], [ 2.11632895, 2.12409687, 1.10879564, 1.02491808, -0.37185353, 0.13456514, -1.70119786, -0.14151937], [-0.58504152, 2.31315374, 0.15611638, 1.2988714 , 1.33584034, 0.29542622, -1.18843138, 0.54929841], [ 0.84831744, -2.25127149, -0.42340177, -0.99950933, -0.33759385, 0.73217863, -1.75246251, -0.20512277], [ 1.16061187, -1.81038654, -1.50839853, 1.90214121, -0.33019581, -1.18630064, -0.29908586, -1.13772762], [-0.85308987, -0.56074762, -0.22539173, -0.95188016, -0.25569537, 1.48671508, -0.4336201 , 2.44569182]], dtype=float32), array([-0.75807816, -0.68674487, -0.79544491, -0.73615742, -0.74876821, -0.73147482, -0.74654377, -0.72675341], dtype=float32)]
and these are the weights after 1 epoch:
[[ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] ..., [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan]]

It´s the same for all layers.
The data_based_init() works fine, by the way.
Any clue what could be happening?
I am using TF v12 with CUDA 8 and a GPU Geforce 1080

@ghost ghost changed the title Wegihts returned as nan Weights returned as nan Jan 5, 2017
@hefeiwangyande
Copy link

Hello, I have the same trouble with you. When using the WN is only the decomposition of W into g, v , the classification task is running normally (accurate rate is not high). But when setting g and b according to Parameter_init in the paper the phenomenon of NAN appeared.
The code is as follows:

def conv2d(x, num_filters, filter_size=[3,3], stride=[1,1], pad='SAME', nonlinearity=None, init_scale=1., ema=None, **kwargs):
    ''' convolutional layer '''
    with tf.variable_scope('conv2d'):
            # data based initialization of parameters
            V = tf.get_variable('V', filter_size+[int(x.get_shape()[-1]),num_filters], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
            V_norm = tf.nn.l2_normalize(V, [0,1,2])
            x_init = tf.nn.conv2d(x, V_norm, [1]+stride+[1], pad)
            m_init, v_init = tf.nn.moments(x_init, [0,1,2])
            scale_init = init_scale/tf.sqrt(v_init + 1e-8)
            g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
                                                             initializer=tf.constant_initializer(1.), trainable=True)
            b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
                                                            initializer=tf.constant_initializer(0.), trainable=True)
            g_u=tf.assign(g,g*scale_init)
            b_u=tf.assign_add(b,-m_init * scale_init)
            # with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
                # g = tf.get_variable('g',dtype=tf.float32, initializer=scale_init, trainable=True)
                # b = tf.get_variable('b', dtype=tf.float32, initializer=-m_init*scale_init, trainable=True)
            x_init = tf.reshape(g_u,[1,1,1,num_filters])*(x_init)+tf.reshape(b_u,[1,1,1,num_filters])
            x = tf.nn.l2_normalize(x_init, dim=[0, 1, 2])
            if nonlinearity is not None:
                x = nonlinearity(x)
            return x

`

@harsh306
Copy link

@wkirgsn
Copy link

wkirgsn commented Sep 19, 2018

same problem here with keras 2 (incorporating the pull request).
No data based init applied - using weight norm for a single layer GRU model.
LSTM is working fine - I guess it has something to do with the initialization of the weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants